Re: problem with facets - out of memory exception

2013-12-19 Thread Marc Sturlese
Have you tried to reindex using DocValues? Fields used for faceting are
stored on disk and not on ram using the FieldCache. If you have enough
memory they will be loaded on the system cache but not on the java heap.
This is good for GC too when committing.
http://wiki.apache.org/solr/DocValues



--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-with-facets-out-of-memory-exception-tp4107390p4107407.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 3.6 optimize and field cache question

2013-07-10 Thread Marc Sturlese
Not a solution for the short term but sounds like a good use case to migrate
to Solr 4.X and use DocValues instead of FieldCache for faceting.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-6-optimize-and-field-cache-question-tp4076398p4076822.html
Sent from the Solr - User mailing list archive at Nabble.com.


Listeners, cores and Similarity

2013-08-16 Thread Marc Sturlese
Hey there,
I'm testing a custom similarity which loads data from and external file
located in solr_home/core_name/conf/. I load data from the file into a Map
on the init method of the SimilarityFactory. I would like to reload that Map
every time a commit happens or every X hours.
To do that I've thought on implementing a custom listener which populates a
custom cache (working as the Map) every time a new searcher is opened. The
problem is that from the SimilarityFactory or Similarity class I can't
access the Solr caches, just have access to the SolrParams.
The only way I see to populate the Map outside the Similarity class is
making it static but would like to avoid that.
Any advice?
Thanks in advance




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Listeners-cores-and-Similarity-tp4085083.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tweaking boosts for more search results variety

2013-09-10 Thread Marc Sturlese
This is totally deprecated but maybe can be helpful if you want to re-sort
some documents
https://issues.apache.org/jira/browse/SOLR-1311



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tweaking-boosts-for-more-search-results-variety-tp4088302p4089044.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 3.3. Grouping vs DeDuplication and Deduplication Use Case

2011-08-30 Thread Marc Sturlese
Deduplication uses lucene indexWriter.updateDocument using the signature
term. I don't think it's possible as a default feature to choose wich
document to index, the "original" should be always the last to be indexed.
/IndexWriter.updateDocument
Updates a document by first deleting the document(s) containing term and
then adding the new document. The delete and then add are atomic as seen by
a reader on the same index (flush may happen only after the add)./

With grouping you have all your documents indexed so it gives you more
flexibility

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-3-Grouping-vs-DeDuplication-and-Deduplication-Use-Case-tp3294711p3295023.html
Sent from the Solr - User mailing list archive at Nabble.com.


Adding a DocSet as a filter from a custom search component

2011-10-25 Thread Marc Sturlese
Hey there,
I'm wondering if there's a more clean way to to this:
I've written a SearchComponent, that runs as last-component. In the prepare
method I build a DocSet (SortedIntDocSet) based on if some values of the
fieldCache of a determined field accomplish some rules (if rules are
accomplished, set the docId to the DocSet). I want to use this DocSet as a
filter for the main query. Right now I'm cloning the existent filters of the
request (if they exist at all) to a filter list and add mine there. Then add
the list to the request Context:

  ... build myDocSet
  DocSet ds =
rb.req.getSearcher().getDocSet(filtersCloned).andNot(myDocSet);
  rb.setFilters(null);   //you'll see why
  rb.req.getContext().put("newFilters",ds);

Then to apply the DocSet containing all filters, in the QueryCommand process
method do:

SolrIndexSearcher.QueryCommand cmd = rb.getQueryCommand();
if(rb.req.getContext().containsKey("newFilters")){
  cmd.setFilter((DocSet)rb.req.getContext().get("newFilters"));
}
As I've set rb.setFilters(null) I won't have exceptions and it will work.
This looks definitely nasty, I would like not to touch the QueryCommand. Any
suggestions?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-a-DocSet-as-a-filter-from-a-custom-search-component-tp3452449p3452449.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: changing omitNorms on an already built index

2011-10-27 Thread Marc Sturlese
As far as I know there's no issue about this. You have to reindex and that's
it.
In which kind of field are you changing the norms? (You just will see
changes in text fields)
Using debugQuery=true you can see how norms affect the score (in case you
have them not omited)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/changing-omitNorms-on-an-already-built-index-tp3459132p3459169.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Collection Distribution vs Replication in Solr

2011-10-27 Thread Marc Sturlese
Replication is easier to manage and a bit faster. See the performance
numbers: http://wiki.apache.org/solr/SolrReplication

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-Distribution-vs-Replication-in-Solr-tp3458724p3459178.html
Sent from the Solr - User mailing list archive at Nabble.com.


performance sorting multivalued field

2010-06-18 Thread Marc Sturlese

hey there!
can someone explain me how impacts to have multivalued fields when sorting?
I have read in other threads how does it affect when faceting but couldn't
find any info of the impact when sorting
Thanks in advance

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p905943.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: performance sorting multivalued field

2010-06-18 Thread Marc Sturlese

I mean sorting the query results, not facets.
I am asking because I have added a multivalued field that has as much 10
values. But 70% of the docs has just 1 or 2 fields of this multiValued
field. I am not doing faceting.
Since I have added the multiValued field, "java old gen" seems to get full
more quick and GC are happening more often.
I don't see why multiValued can use more memory querying by normal
relevance. That's why I think maybe it's sort queries fault...
Any explanation or advice?
Thanks in advance
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p906115.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: performance sorting multivalued field

2010-06-19 Thread Marc Sturlese

Hey Erik,
I am currently sorting by a multiValued. It apears a feature tha't you may
not know wich of the fields of the multiValued field makes the document be
in that position. This is good for me, I don't care for my tests.
What I need to know if there is any performance issue in all of this.
Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p907502.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can query boosting be used with a custom request handlers?

2010-06-21 Thread Marc Sturlese

Maybe this helps:
http://wiki.apache.org/solr/SolrPlugins#QParserPlugin
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-query-boosting-be-used-with-a-custom-request-handlers-tp884499p912691.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: performance sorting multivalued field

2010-06-22 Thread Marc Sturlese

>>Well, sorting requires that all the unique values in the target field
>>get loaded into memory
That's what I tought, thanks.

>>But a larger question is whether what your doing is worthwhile
>>even as just a measurement. You say
>>"This is good for me, I don't care for my tests". I claim that
>>you do care
I just like play with things. First checked the behavior of sorting on
multiValued field and what I noticed was, let's say you have docs with field
called 'num':
doc1->num:2;doc2->num:1,num:4;doc3->num:5
Sorting by the field num what you get is:
After sorting asc I get: doc2,doc1,doc3.
The behavior seems to be always the same (I am not saying it works like that
but it's what I've seen in my examples)
After seeing that I just decided to check the performance. The point is
simply curiosity.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p913626.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: anyone use hadoop+solr?

2010-06-22 Thread Marc Sturlese

I think there's people using this patch in production:
https://issues.apache.org/jira/browse/SOLR-1301
I have tested it myself indexing data from CSV and from HBase and it works
properly
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914553.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr with hadoop

2010-06-22 Thread Marc Sturlese

I think a good solution could be to use hadoop with SOLR-1301 to build solr
shards and then use solr distributed search against these shards (you will
have to copy to local from HDFS to search against them)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p914576.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: anyone use hadoop+solr?

2010-06-22 Thread Marc Sturlese

Well, the patch consumes the data from a csv. You have to modify the input to
use TableInputFormat (I don't remember if it's called exaclty like that) and
it will work.
Once you've done that, you have to specify as much reducers as shards you
want.

I know 2 ways to index using hadoop
method 1 (solr-1301 & nutch):
-Map: just get data from the source and create key-value
-Reduce: does the analysis and index the data
So, the index is build on the reducer side

method 2 (hadoop lucene index contrib)
-Map: does analysis and open indexWriter to add docs
-Reducer: Merge small indexs build in the map
So, indexs are build on the map side
method 2 has no good integration with Solr at the moment.

In the jira (SOLR-1301) there's a good explanation of the advantages and
disadvantages of indexing on the map or reduce side. I recomend you to read
with detail all the comments on the jira to know exactly how it works.


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914625.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: anyone use hadoop+solr?

2010-06-24 Thread Marc Sturlese

Hi Otis, just for curiosity, wich strategy do you use? Index in the map or
reduce side?
Do you use it to build shards or a single monolitic index?
Thanks

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p919335.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: performance sorting multivalued field

2010-06-24 Thread Marc Sturlese

Thanks, that's very useful info. However can't reproduce the error. I've
created and index where all documents have a multivalued date field and each
document have a minimum of one value in that field. (most of the docs have 2
or 3). So, the number of un-inverted term instances is greater than
the number of documents.
*There are lot's of docs with the same value, I mention that because I
supose that same value has nothing to do with the number of un-inverted term
instances.

Never get the error explained here:
http://lucene.472066.n3.nabble.com/Different-sort-behavior-on-same-code-td503761.html
Could be that solr 1.4 or lucene 2.9.1 handle this avoiding the error?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p920464.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: performance sorting multivalued field

2010-06-25 Thread Marc Sturlese

>>*There are lot's of docs with the same value, I mention that because I
supose that same value has nothing to do with the number of un-inverted term
instances.
It has to do, I've been able to reproduce teh error by setting different
values to each field:

HTTP Status 500 - there are more terms than documents in field "date", but
it's impossible to sort on tokenized fields java.lang.RuntimeException:
there are more terms than documents in field "id", but it's impossible to
sort on tokenized fields at
org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:706)...
 

But, it's already fixed for Lucene 2.9.4, 3.0.3, 3.1, 4.0 versions:
https://issues.apache.org/jira/browse/LUCENE-2142
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p921752.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Recommended MySQL JDBC driver

2010-06-26 Thread Marc Sturlese

I supose you use BatchSize=-1 to index that amount of data. Up from 5.1.7
connector there's this param:
netTimeoutForStreamingResults
The default value is 600. Increasing that maybe can help (2400 for example?)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Recommended-MySQL-JDBC-driver-tp817458p924107.html
Sent from the Solr - User mailing list archive at Nabble.com.


ending an app taht uses EmbeddedSolrServer

2010-07-13 Thread Marc Sturlese

Hey there,
I've done some tests with a custom java app using EmbeddedSolrServer to
create an index.
It works ok and I am able to build the index but I've noticed after the
commit an optimize are done, the app never terminates.
How should I end it? Is there any way to tell the EmbeddedSolrServer to
close?
Thanks in advance
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/ending-an-app-taht-uses-EmbeddedSolrServer-tp963573p963573.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ending a java app that uses EmbeddedSolrServer

2010-07-13 Thread Marc Sturlese

Seems that coreContainer.shoutdown() solves the problem.
Anyone doing it in a different way?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/ending-a-java-app-that-uses-EmbeddedSolrServer-tp963573p964013.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: maxMergeDocs and performance tuning

2010-08-16 Thread Marc Sturlese

As far as I know, the higher you set the value, the faster the indexing
process will be (because more things are kept in memory). But depending on
which are your needs, it may not be the best option. If you set a high
mergeFactor and you want to optimize the index once the process is done,
this optimization process will take longer than if the merge factor was very
low.
This is because optimization process compacts many segment files. If
mergeFactor is lower, there will be less files so optimize will be faster.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/maxMergeDocs-and-performance-tuning-tp1162695p1168480.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: JVM GC is very frequent.

2010-08-26 Thread Marc Sturlese

http://www.lucidimagination.com/blog/2009/09/19/java-garbage-collection-boot-camp-draft/

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/JVM-GC-is-very-frequent-tp1345760p1348065.html
Sent from the Solr - User mailing list archive at Nabble.com.


FieldCache.DEFAULT.getInts vs FieldCache.DEFAULT.getStringIndex. Memory usage

2010-08-26 Thread Marc Sturlese

I need to load a FieldCache for a field wich is a solr "integer" type and has
as maximum 3 digits. Let's say my index has 10M docs.
I am wandering what is more optimal and less memory consumig, to load a
FieldCache.DEFAUL.getInts or a FieldCache.DEFAULT.getStringIndex.

The second one will have a int[] for as many docs as the index have.
Additionally will have a String[] for as many unique terms. As I am dealing
with numbers, I will have to cast the values of the String[] to deal with
them.

If I load a FieldCache.DEFAULT.getInts I will have just an int[] with a
value of a doc field on each array position. I will be able to deal straight
with the ints... in this case will it be more optimal to use this?

Thanks in advance
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/FieldCache-DEFAULT-getInts-vs-FieldCache-DEFAULT-getStringIndex-Memory-usage-tp1348480p1348480.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Null pointer exception when mixing highlighter & shards & q.alt

2010-09-07 Thread Marc Sturlese

I noticed that long ago.
Fixed it doing in HighlightComponent finishStage:
  @Override
  public void finishStage(ResponseBuilder rb) {
boolean hasHighlighting = true ;
if (rb.doHighlights && rb.stage == ResponseBuilder.STAGE_GET_FIELDS) {

  Map.Entry[] arr = new
NamedList.NamedListEntry[rb.resultIds.size()];

  // TODO: make a generic routine to do automatic merging of id keyed
data
  for (ShardRequest sreq : rb.finished) {
if ((sreq.purpose & ShardRequest.PURPOSE_GET_HIGHLIGHTS) == 0)
continue;
for (ShardResponse srsp : sreq.responses) {
  NamedList hl =
(NamedList)srsp.getSolrResponse().getResponse().get("highlighting");
  //patch bug
  if(hl != null) {
for (int i=0; ihttp://lucene.472066.n3.nabble.com/Null-pointer-exception-when-mixing-highlighter-shards-q-alt-tp1430353p1431253.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: what differents between SolrCloud and Solr+Hadoop

2010-09-13 Thread Marc Sturlese

Well these are pretty different things. SolrCloud is meant to handle
distributed search in a more easy way that "raw" solr distributed search.
You have to build the shards in your own way.
Solr+hadoop is a way to build these shards/indexes in paralel.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/what-differents-between-SolrCloud-and-Solr-Hadoop-tp1463809p1464106.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How do you programatically create new cores?

2010-10-17 Thread Marc Sturlese

You have to create the core's folder with it's conf inside the Solr home.
Once done you can call the create action of the admin handler:
http://wiki.apache.org/solr/CoreAdmin#CREATE
If you need to dinamically create, start and stop lots of cores there's this
patch, but don't know about it's current state:
http://wiki.apache.org/solr/LotsOfCores

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-you-programatically-create-new-cores-tp1706487p1718648.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dynamically create new core

2010-11-02 Thread Marc Sturlese

To create the core, the folder with the confs must already exist and has to
be placed in the proper place (inside the solr home). Once you run the
create core action, this core will we added to solr.xml and dinamically
loaded.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Dynamically-create-new-core-tp1827097p1828560.html
Sent from the Solr - User mailing list archive at Nabble.com.


Core status uptime and startTime

2010-11-03 Thread Marc Sturlese

As far as I know, in the core admin page you can find when was the last time
an index had a modification and was comitted checking the lastModified.
But? what startTime and uptime mean?
Thanks in advance
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Core-status-uptime-and-startTime-tp1834806p1834806.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Adding new field after data is already indexed

2010-11-08 Thread Marc Sturlese

>> and i index data on the basis of these fields. Now, incase i need to add a
new field, is there a way i can >> add the field without corrupting the
previous data. Is there any feature which adds a new field with a 
>> default value to the existing records.

You just have to add the new field in the schema.xml to make Solr know about
it. All already indexed documents won't have any value into this field but
that doesn't break anything. If you want to give a default value you will
have to rebuild your index.

>> Is there any security mechanism/authorization check to prevent url like
>> /admin and /update to only a few users. 
As far as I know there's no out of the box feature to do that
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-new-field-after-data-is-already-indexed-tp1862575p1862722.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Adding new field after data is already indexed

2010-11-08 Thread Marc Sturlese

>> and i index data on the basis of these fields. Now, incase i need to add a
new field, is there a way i can >> add the field without corrupting the
previous data. Is there any feature which adds a new field with a 
>> default value to the existing records.

You just have to add the new field in the schema.xml to make Solr know about
it. All already indexed documents won't have any value into this field but
that doesn't break anything. If you want to give a default value you will
have to rebuild your index.

>> Is there any security mechanism/authorization check to prevent url like
>> /admin and /update to only a few users. 
As far as I know there's no out of the box feature to do that
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-new-field-after-data-is-already-indexed-tp1862575p1862724.html
Sent from the Solr - User mailing list archive at Nabble.com.


about NRTCachingDirectory

2012-12-10 Thread Marc Sturlese
I have a doubt about how NRTCachingDirectory works.
As far as I've seen, it receives a delegator Directory and caches newly
created segments. So, if MMapDirectory use to be the default:

1.- Does NRTCachingDirectory works acting sort of as a wrapper of MMap
caching the new segments?

2.- If I have a master/slave setup and deploy a full optimized index with a
single segment and the slave is configured with NRTCachingDirectory, will it
try to cache that segment (I suppose not)?
And let's say I remove the replication, and start adding docs to that slave,
creating small segments every 10 minutes, will by default the
NRTCachingDirectory start caching this new small segments?
And finally, If I set up again the replication, when a full new index with
single segment is deployed, how NRTCachingDirectory would behave?

Know it's not a typical use case, but would like to know how it behaves in
those different situations.
Thanks in advance.
 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/about-NRTCachingDirectory-tp4025665.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Shard timeouts on large (1B docs) Solr cluster

2012-02-03 Thread Marc Sturlese
timeAllowed can be used outside distributed search. It is used by the
TimeL¡mitingCollector. When the search time is equal to timeAllowed it will
stop searching and will return the results that could find till then.
This can be a problem when using incremental indexing. Lucene starts
searching from "the bottom" and new docs are inserted on the top, so,
timeAllowed could cause that new docs never appear on the search results.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Shard-timeouts-on-large-1B-docs-Solr-cluster-tp3691229p3713263.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to ignore indexing of duplicated documents?

2012-03-12 Thread Marc Sturlese
http://wiki.apache.org/solr/Deduplication

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-ignore-indexing-of-duplicated-documents-tp3814858p3818973.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Faceting on a date field multiple times

2012-05-04 Thread Marc Sturlese
http://lucene.472066.n3.nabble.com/Multiple-Facet-Dates-td495480.html

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Faceting-on-a-date-field-multiple-times-tp3961282p3961865.html
Sent from the Solr - User mailing list archive at Nabble.com.


latest patches and big picture of search grouping

2011-01-17 Thread Marc Sturlese

I need to dive into search grouping / field collapsing again. I've seen there
are lot's of issues about it now.
Can someone point me to the minimum patches I need to run this feature in
trunk? I want to see the code of the most optimised version and what's being
done in distributed search. I think I need this:

https://issues.apache.org/jira/browse/SOLR-2068
https://issues.apache.org/jira/browse/SOLR-2205
https://issues.apache.org/jira/browse/SOLR-2066

But not sure if I am missing anything else.

By the way, I think the current implementation of group searching is totally
different that what it was before when you could choose normal or adjacent
collapse.
Can someone give me a quick big picture of the current implementation (I
will trace the code anyway, but it's just to get an idea). Is there still a
double trip?

Thanks in advance.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/latest-patches-and-big-picture-of-search-grouping-tp2271383p2271383.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need to create dyanamic indexies base on different document workspaces

2011-04-22 Thread Marc Sturlese
In case you need to create lots of indexes and register/unregister fast,
there is work on the way http://wiki.apache.org/solr/LotsOfCores

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-to-create-dyanamic-indexies-base-on-different-document-workspaces-tp2845919p2852410.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strange performance behaviour when concurrent requests are done

2011-04-29 Thread Marc Sturlese
Any suggestion about this issue?--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strange-performance-behaviour-when-concurrent-requests-are-done-tp505478p2878758.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strange performance behaviour when concurrent requests are done

2011-04-29 Thread Marc Sturlese
That's true. But the degradation is so big. If you use lunch concurrent
requests to a web app taht doesn't use Solr the time per request won't
degradate that much. For me, it looks more like a synchronized is happening
somewhere in Solr or Lucene and is causing this.--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strange-performance-behaviour-when-concurrent-requests-are-done-tp505478p2878856.html
Sent from the Solr - User mailing list archive at Nabble.com.


problem with the new IndexSearcher when snpainstaller (and commit script) happen

2011-06-15 Thread Marc Sturlese
Hey there,
I've noticed a very odd behaviour with the snapinstaller and commit (using
collectionDistribution scripts). The first time I install a new index
everything works fine. But when installing a new one, I can't see the new
documents. Checking the status page of the core tells me that the index
version has changed but numDocs and maxDocs are the same. I have a simple
script that get the version form an index reader and this confirms me that
that's not true. numDocs and maxDocs are different in both indexs.
The index I'm trying to install is a whole new index, generated with
mergefactor = 2 and optimized with no compound file.

I've tried manually to mv index to index.old and the snapshot.x to index
(while tomcat is up) and manually execute:
 curl http://localhost:8080/trovit_solr/coreA/update?commit=true -H
"Content-Type: text/xml"
But the same is happening.
Checking the logs I can see that apparently everything is fine. New searcher
is registered and warming is properly done to it.

I would think that the problem is with some reference opening the index
searcher. But the fact that the indexVersion changes but the numDocs and
maxDocs dont' makes me understand nothing.

If I reload the core, numDocs and maxDocs changes and everything is fine.

Any idea what could be happening here?
Thanks in advance.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-with-the-new-IndexSearcher-when-snpainstaller-and-commit-script-happen-tp3066902p3066902.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: problem with the new IndexSearcher when snpainstaller (and commit script) happen

2011-06-15 Thread Marc Sturlese
Test are done on Solr 1.4
The simplest way to reproduce my problem is having 2 indexes and a Solr box
with just one core. Both index must have been created with the same schema.

1- Remove the index dir of the core and start the server (core is up with an
empty index)
2- check status page of the core (version,numDocs,maxDocs version should be
X and numDocs,maxDocs zero)
3- mv index to index.old
4- mv your folderA (wich contains an index) to index
5- execute curl http://localhost:8080/solr/coreA/update?commit=true -H
"Content-Type: text/xml" 
* Here the log shows me that commit has been executed and new IndexSearcher
has been registered and proper warming has been done.
6- Check the core status page (here all has changed:
version,numDocs,maxDocs)

If now I repeat steps 3,4,5 (in this case using folderB with another index),
when I do step 6, indexVersion has changed but numDocs and maxDocs stay the
same, which I can't understand in any way. (opening the index with my script
shows me that are not).

I've ended up doing this test after noticing the problem before with the
snapinstaller and commit.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-with-the-new-IndexSearcher-when-snpainstaller-and-commit-script-happen-tp3066902p3067042.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: problem with the new IndexSearcher when snpainstaller (and commit script) happen

2011-06-15 Thread Marc Sturlese
I don't know if this could have something to do with the problem but some of
the files of the indexes have same size and name (in all the index but not
in the empty one).
I have also realized that when moving back to the empty index and
committing, numDocs and maxDocs change. Once I'm with the empty index, if I
move to another one it works too. The problem happens when moving to a none
empty index to another none empty index.
That's because I think that the name and size of some files could have
something to do with the problem:

./index.1:
total 702024
drwxr-sr-x  12 marc  admin408 15 Jun 11:01 .
drwxr-xr-x  10 marc  admin340 15 Jun 16:04 ..
-rw-r--r--   1 marc  admin  269347737 15 Jun 10:57 _3.fdt
-rw-r--r--   1 marc  admin2067804 15 Jun 10:57 _3.fdx
-rw-r--r--   1 marc  admin463 15 Jun 10:57 _3.fnm
-rw-r--r--   1 marc  admin   40372030 15 Jun 10:59 _3.frq
-rw-r--r--   1 marc  admin1033904 15 Jun 10:59 _3.nrm
-rw-r--r--   1 marc  admin   27021337 15 Jun 11:00 _3.prx
-rw-r--r--   1 marc  admin 234891 15 Jun 11:00 _3.tii
-rw-r--r--   1 marc  admin   19330416 15 Jun 11:01 _3.tis
-rw-r--r--   1 marc  admin 20 15 Jun 11:01 segments.gen
-rw-r--r--   1 marc  admin298 15 Jun 11:01 segments_2

./index.2:
total 701296
drwxr-sr-x  12 marc  admin408 15 Jun 11:11 .
drwxr-xr-x  10 marc  admin340 15 Jun 16:04 ..
-rw-r--r--   1 marc  admin  269044254 15 Jun 11:09 _3.fdt
-rw-r--r--   1 marc  admin2068116 15 Jun 11:09 _3.fdx
-rw-r--r--   1 marc  admin463 15 Jun 11:09 _3.fnm
-rw-r--r--   1 marc  admin   40320465 15 Jun 11:10 _3.frq
-rw-r--r--   1 marc  admin1034060 15 Jun 11:10 _3.nrm
-rw-r--r--   1 marc  admin   26967519 15 Jun 11:11 _3.prx
-rw-r--r--   1 marc  admin 235895 15 Jun 11:11 _3.tii
-rw-r--r--   1 marc  admin   19372446 15 Jun 11:11 _3.tis
-rw-r--r--   1 marc  admin 20 15 Jun 11:11 segments.gen
-rw-r--r--   1 marc  admin298 15 Jun 11:11 segments_2

./index.empty:
total 16
drwxr-xr-x   4 marc  admin  136 15 Jun 10:45 .
drwxr-xr-x  10 marc  admin  340 15 Jun 16:04 ..
-rw-r--r--   1 marc  admin   20 15 Jun 10:45 segments.gen
-rw-r--r--   1 marc  admin   32 15 Jun 10:45 segments_1



--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-with-the-new-IndexSearcher-when-snpainstaller-and-commit-script-happen-tp3066902p3067466.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: problem with the new IndexSearcher when snpainstaller (and commit script) happen

2011-06-15 Thread Marc Sturlese
I have some more info!
I've build another index bigger than the others so names of the files are
not the same. This way, if I move from any of the other index to the bigger
one or vicevera it works (I can see the cahnges in the version, numDocs and
maxDocs)! So, I thing it is related to the name of the files.
Maybe the server gets confused with the pointers of the older index files or
something like that?
The bigger index looks like:

./index.big:
total 4181088
drwxr-xr-x  12 marc  admin 408 15 Jun 16:46 .
drwxr-xr-x  11 marc  admin 374 15 Jun 16:48 ..
-rw-r--r--   1 marc  admin  1666038160 15 Jun 16:43 _4.fdt
-rw-r--r--   1 marc  admin 9178780 15 Jun 16:43 _4.fdx
-rw-r--r--   1 marc  admin 477 15 Jun 16:43 _4.fnm
-rw-r--r--   1 marc  admin   232687972 15 Jun 16:44 _4.frq
-rw-r--r--   1 marc  admin 4589392 15 Jun 16:44 _4.nrm
-rw-r--r--   1 marc  admin   161931683 15 Jun 16:45 _4.prx
-rw-r--r--   1 marc  admin  824985 15 Jun 16:45 _4.tii
-rw-r--r--   1 marc  admin65438631 15 Jun 16:45 _4.tis
-rw-r--r--   1 marc  admin  20 15 Jun 16:45 segments.gen
-rw-r--r--   1 marc  admin 298 15 Jun 16:45 segments_2


--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-with-the-new-IndexSearcher-when-snpainstaller-and-commit-script-happen-tp3066902p3067657.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: problem with the new IndexSearcher when snpainstaller (and commit script) happen [SOLVED]

2011-06-15 Thread Marc Sturlese
I've found the problem in case someone is interested.
It's because of the indexReader.reopen(). If it is enabled, when opening a
new searcher due to the commit, this code is executed (in
SolrCore.getSearcher(boolean forceNew, boolean returnSearcher, final
Future[] waitSearcher)):
  ...
  if (newestSearcher != null && solrConfig.reopenReaders
  && indexDirFile.equals(newIndexDirFile)) {
IndexReader currentReader = newestSearcher.get().getReader();
IndexReader newReader = currentReader.reopen();

if (newReader == currentReader) {
  currentReader.incRef();
}

tmp = new SolrIndexSearcher(this, schema, "main", newReader, true,
true);
 
  } else {
IndexReader reader =
getIndexReaderFactory().newReader(getDirectoryFactory().open(newIndexDir),
true);
tmp = new SolrIndexSearcher(this, schema, "main", reader, true,
true);
  }
  ...

If the names of the segments haven't changed, IndexReader.reopen thinks that
they haven't actually changed (but they have in my case as index files have
same name but have different docs at the same time) so instead of opening
new reader for the segments, it gives back the same one and changes can't be
seen by the new IndexSearcher.
Knowing that performance gets worse, disabling reopen in solrconfig.xml
solves the problem (and is still better performance that reloading the whole
core).
Does someone knows that this still happen on Lucene 3.2?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-with-the-new-IndexSearcher-when-snpainstaller-and-commit-script-happen-tp3066902p3068956.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: embeded solrj doesn't refresh index

2011-07-22 Thread Marc Sturlese
Are u indexing with full import? In case yes and the resultant index has
similar num of docs (that the one you had before) try setting reopenReaders
to false in solrconfig.xml
* You have to send the comit, of course.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/embeded-solrj-doesn-t-refresh-index-tp3184321p3190892.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Boost documents based on the number of their fields

2011-08-19 Thread Marc Sturlese
You have different options here. You can give more boost at indexing time to
the documents that have set the fields you want. For this to take effect you
will have to reindex and set omitNorms="false" to the fields you are going
to search. This same concept can be applied to boost single fields instead
of whole document boost.
Another option would be to use boosting queries at search time such as:
bq=video:[* TO *]^100 (this gives more boost to the documents that have
whatever value in video field).

The second one is much easy to play with as you don't have to reindex every
time you change a value. On the other said you pay the performance penalty
of running one extra query.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boost-documents-based-on-the-number-of-their-fields-tp3266875p3267628.html
Sent from the Solr - User mailing list archive at Nabble.com.


offsets issues with multiword synonyms since LUCENE_33

2012-08-14 Thread Marc Sturlese
Has someone noticed this problem and solved it somehow? (without using
LUCENE_33 in the solrconfig.xml)
https://issues.apache.org/jira/browse/LUCENE-3668

Thanks in advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/offsets-issues-with-multiword-synonyms-since-LUCENE-33-tp4001195.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: offsets issues with multiword synonyms since LUCENE_33

2012-08-14 Thread Marc Sturlese
Well an example would be:
synonyms.txt:
huge,big size

The I have the docs:
1- The huge fox attacks first
2- The big size fox attacks first

Then if I query for huge, the highlights for each document are:

1- The huge fox attacks first
2- The big size fox attacks first

The analyzer looks like this:
fieldType name="sy_text" class="solr.TextField" positionIncrementGap="100">
  



 
  
  



 
  


This was working with a previous version of Solr (couldn't make it work with
3.6, 4-alpha nor 4-beta).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/offsets-issues-with-multiword-synonyms-since-LUCENE-33-tp4001195p4001213.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: FieldCollapsing: Two response elements returned?

2009-07-28 Thread Marc Sturlese

That's provably because you are using both the CollpaseComponent and the
QueryComponent. I think the 2 or 3 last patches allow full replacement of
QueryComponent.You shoud just replace:


for:


This will sort your problem and make response times faster.



Jay Hill wrote:
> 
> I'm doing some testing with field collapsing, and early results look good.
> One thing seems odd to me however. I would expect to get back one block of
> results, but I get two - the first one contains the collapsed results, the
> second one contains the full non-collapsed results:
> 
>  ... 
>  ... 
> 
> This seems somewhat confusing. Is this intended or is this a bug?
> 
> Thanks,
> -Jay
> 
> 

-- 
View this message in context: 
http://www.nabble.com/FieldCollapsing%3A-Two-response-elements-returned--tp24690426p24693960.html
Sent from the Solr - User mailing list archive at Nabble.com.



update some index documents after indexing process is done with DIH

2009-07-28 Thread Marc Sturlese

Hey there,
I would like to be able to do something like: After the indexing process is
done with DIH I would like to open an indexreader, iterate over all docs,
modify some of them depending on others and delete some others. I can easy
do this directly coding with lucene but would like to know if there's a way
to do it with Solr using SolrDocument or SolrInputDocument classes.
I have thougth in using SolrJ or DIH listener onImportEnd but not sure if I
can get an IndexReader in there.
Any advice?
Thanks in advance
-- 
View this message in context: 
http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24695947.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: update some index documents after indexing process is done with DIH

2009-07-28 Thread Marc Sturlese

Ok, but if I handle it in a newSearcher listener it will be executed every
time I reload a core, isn't it? The thing is that I want to use an
IndexReader to load in a HashMap some doc fields of the index and depending
of the values of some field docs modify other docs. Its very memory
consuming (I have tested it with a simple lucene script). Thats why I wanted
to do it just after the indexing process.

My ideal case would be to do it in the commit function of
DirectUpdatehandler2.java just before
writer.optimize(cmd.maxOptimizeSegments); is executed. But I don't want to
mess that code... so trying to find out the best way to do that as a plugin
instead of a hack as possible.

Thanks in advance


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> It is best handled as a 'newSearcher' listener in solrconfig.xml.
> onImportEnd is invoked before committing
> 
> On Tue, Jul 28, 2009 at 3:13 PM, Marc Sturlese
> wrote:
>>
>> Hey there,
>> I would like to be able to do something like: After the indexing process
>> is
>> done with DIH I would like to open an indexreader, iterate over all docs,
>> modify some of them depending on others and delete some others. I can
>> easy
>> do this directly coding with lucene but would like to know if there's a
>> way
>> to do it with Solr using SolrDocument or SolrInputDocument classes.
>> I have thougth in using SolrJ or DIH listener onImportEnd but not sure if
>> I
>> can get an IndexReader in there.
>> Any advice?
>> Thanks in advance
>> --
>> View this message in context:
>> http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24695947.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 
> 

-- 
View this message in context: 
http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24696872.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: update some index documents after indexing process is done with DIH

2009-07-28 Thread Marc Sturlese

That really sounds the best way to reach my goal. How could I invoque a
listener from the newSearcher?Would be something like:

  
 solr 0 10 
 rocks 0 10 
static newSearcher warming query from
solrconfig.xml
  



And MyCustomListener would be the class who open the reader: 
   
RefCounted searchHolder = null;
try {
  searchHolder = dataImporter.getCore().getSearcher();
  IndexReader reader = searchHolder.get().getReader();
  
  //Here I iterate over the reader doing docuemnt modifications

} finally {
   if (searchHolder != null) searchHolder.decref();
}
} catch (Exception ex) {
LOG.info("error");  
}

Finally, to access to documents and add fields to some of them, I have
thought in using SolrDocument classes. Can you please point me where
something similar is done in solr source (I mean creation of SolrDocuemnts
and conversion of them to proper lucene docuements).

Does this way for reaching the goal makes sense?

Thanks in advance



Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> when a core is reloaded the event fired is firstSearcher. newSearcher
> is fired when a commit happens
> 
> 
> On Tue, Jul 28, 2009 at 4:19 PM, Marc Sturlese
> wrote:
>>
>> Ok, but if I handle it in a newSearcher listener it will be executed
>> every
>> time I reload a core, isn't it? The thing is that I want to use an
>> IndexReader to load in a HashMap some doc fields of the index and
>> depending
>> of the values of some field docs modify other docs. Its very memory
>> consuming (I have tested it with a simple lucene script). Thats why I
>> wanted
>> to do it just after the indexing process.
>>
>> My ideal case would be to do it in the commit function of
>> DirectUpdatehandler2.java just before
>> writer.optimize(cmd.maxOptimizeSegments); is executed. But I don't want
>> to
>> mess that code... so trying to find out the best way to do that as a
>> plugin
>> instead of a hack as possible.
>>
>> Thanks in advance
>>
>>
>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>>
>>> It is best handled as a 'newSearcher' listener in solrconfig.xml.
>>> onImportEnd is invoked before committing
>>>
>>> On Tue, Jul 28, 2009 at 3:13 PM, Marc Sturlese
>>> wrote:
>>>>
>>>> Hey there,
>>>> I would like to be able to do something like: After the indexing
>>>> process
>>>> is
>>>> done with DIH I would like to open an indexreader, iterate over all
>>>> docs,
>>>> modify some of them depending on others and delete some others. I can
>>>> easy
>>>> do this directly coding with lucene but would like to know if there's a
>>>> way
>>>> to do it with Solr using SolrDocument or SolrInputDocument classes.
>>>> I have thougth in using SolrJ or DIH listener onImportEnd but not sure
>>>> if
>>>> I
>>>> can get an IndexReader in there.
>>>> Any advice?
>>>> Thanks in advance
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24695947.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> -
>>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24696872.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 
> 

-- 
View this message in context: 
http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24697751.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: update some index documents after indexing process is done with DIH

2009-07-29 Thread Marc Sturlese

>From the newSearcher(..) of a CustomEventListener which extends of
AbstractSolrEventListener  can access to SolrIndexSearcher and all core
properties but can't get a SolrIndexWriter. Do you now how can I get from
there a SolrIndexWriter? This way I would be able to modify the documents (I
need to modify them depending on values of other documents, that's why I
can't do it with DIH delta-import).
Thanks in advance


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> On Tue, Jul 28, 2009 at 5:17 PM, Marc Sturlese
> wrote:
>>
>> That really sounds the best way to reach my goal. How could I invoque a
>> listener from the newSearcher?Would be something like:
>>    
>>      
>>         solr 0 > name="rows">10 
>>         rocks 0 > name="rows">10 
>>        static newSearcher warming query from
>> solrconfig.xml
>>      
>>    
>>    
>>
>> And MyCustomListener would be the class who open the reader:
>>
>>        RefCounted searchHolder = null;
>>        try {
>>          searchHolder = dataImporter.getCore().getSearcher();
>>          IndexReader reader = searchHolder.get().getReader();
>>
>>          //Here I iterate over the reader doing docuemnt modifications
>>
>>        } finally {
>>           if (searchHolder != null) searchHolder.decref();
>>        }
>>        } catch (Exception ex) {
>>            LOG.info("error");
>>        }
> 
> you may not be able to access the DIH API from a newSearcher event .
> But the API would give you the searcher directly as a method
> parameter.
>>
>> Finally, to access to documents and add fields to some of them, I have
>> thought in using SolrDocument classes. Can you please point me where
>> something similar is done in solr source (I mean creation of
>> SolrDocuemnts
>> and conversion of them to proper lucene docuements).
>>
>> Does this way for reaching the goal makes sense?
>>
>> Thanks in advance
>>
>>
>>
>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>>
>>> when a core is reloaded the event fired is firstSearcher. newSearcher
>>> is fired when a commit happens
>>>
>>>
>>> On Tue, Jul 28, 2009 at 4:19 PM, Marc Sturlese
>>> wrote:
>>>>
>>>> Ok, but if I handle it in a newSearcher listener it will be executed
>>>> every
>>>> time I reload a core, isn't it? The thing is that I want to use an
>>>> IndexReader to load in a HashMap some doc fields of the index and
>>>> depending
>>>> of the values of some field docs modify other docs. Its very memory
>>>> consuming (I have tested it with a simple lucene script). Thats why I
>>>> wanted
>>>> to do it just after the indexing process.
>>>>
>>>> My ideal case would be to do it in the commit function of
>>>> DirectUpdatehandler2.java just before
>>>> writer.optimize(cmd.maxOptimizeSegments); is executed. But I don't want
>>>> to
>>>> mess that code... so trying to find out the best way to do that as a
>>>> plugin
>>>> instead of a hack as possible.
>>>>
>>>> Thanks in advance
>>>>
>>>>
>>>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>>>>
>>>>> It is best handled as a 'newSearcher' listener in solrconfig.xml.
>>>>> onImportEnd is invoked before committing
>>>>>
>>>>> On Tue, Jul 28, 2009 at 3:13 PM, Marc
>>>>> Sturlese
>>>>> wrote:
>>>>>>
>>>>>> Hey there,
>>>>>> I would like to be able to do something like: After the indexing
>>>>>> process
>>>>>> is
>>>>>> done with DIH I would like to open an indexreader, iterate over all
>>>>>> docs,
>>>>>> modify some of them depending on others and delete some others. I can
>>>>>> easy
>>>>>> do this directly coding with lucene but would like to know if there's
>>>>>> a
>>>>>> way
>>>>>> to do it with Solr using SolrDocument or SolrInputDocument classes.
>>>>>> I have thougth in using SolrJ or DIH listener onImportEnd but not
>>>>>> sure
>>>>>> if
>>>>>> I
>>>>>> can get an IndexReader in there.
>>>>>> Any advice?
>>>>>> Thanks in advance
>>>>>> -

Re: update some index documents after indexing process is done with DIH

2009-07-30 Thread Marc Sturlese

Hoss I see what you mean. I am trying to implement a CustomUpdateProcessor
checking out here:
http://wiki.apache.org/solr/UpdateRequestProcessor
What is confusing me now is that I have to implement my logic in
processComit as you said:

>>you'll still need the "double commit" (once so you can see the 
>>main changes, and once so the rest of the world can see your 
>>modifications) but you can execute them both directly in your 
>>processCommit(CommitUpdateCommand)

I have noticed that in the processAdd you have acces to the concrete
SolrInpuntDocument you are going to add:
SolrInputDocument doc = cmd.getSolrInputDocument();

But in processCommit, having access to the core I can get the IndexReader
but I still don't know how to get the IndexWriter and SolrInputDocuments in
there.
My idea is to do something like:

   @Override
public void processCommit(CommitUpdateCommand cmd) throws IOException {
  //first commit that show me modification
  //open and iterate over the reader and create solrDocuments list
  //close reader
  //openwriter and update the docs in the list
  //close writer and second commit that shows my changes to the world
  
  if (next != null)
next.processCommit(cmd);

}

As I understood the process, the commitCommand will be sent to the
DirectUpdateHandler2. that will proper do the commit via
UpdateRequestProcessor.
Am I in the right way?  I haven't dealed with CustomUpdateProcessor for
doing something after a commit is executed so I am a bit confused...

Thanks in advance.




hossman wrote:
> 
> 
> This thread all sounds really kludgy ... among other things the 
> newSearcher listener is going to need to some how keep track of when it 
> was called as a result of a "real" commit, vs when it was called as the 
> result of a commit it itself triggered to make changes.
> 
> wouldn't an easier place to implement this logic be in an UpdateProcessor?  
> you'll still need the "double commit" (once so you can see the 
> main changes, and once so the rest of the world can see your 
> modifications) but you can execute them both directly in your 
> processCommit(CommitUpdateCommand) method (so you don't have to worry 
> about being able to tell them apart)
> 
> : Date: Thu, 30 Jul 2009 10:14:16 +0530
> : From:
> :
> =?UTF-8?B?Tm9ibGUgUGF1bCDgtKjgtYvgtKzgtL/gtLPgtY3igI0gIOCkqOCli+CkrOCljeCk
> : s+CljQ==?= 
> : Reply-To: solr-user@lucene.apache.org, noble.p...@gmail.com
> : To: solr-user@lucene.apache.org
> : Subject: Re: update some index documents after indexing process is done
> with 
> : DIH
> : 
> : If you make your EventListener implements SolrCoreAware you can get
> : hold of the core on inform. use that to get hold of the
> : SolrIndexWriter
> : 
> : On Wed, Jul 29, 2009 at 9:20 PM, Marc Sturlese
> wrote:
> : >
> : > From the newSearcher(..) of a CustomEventListener which extends of
> : > AbstractSolrEventListener  can access to SolrIndexSearcher and all
> core
> : > properties but can't get a SolrIndexWriter. Do you now how can I get
> from
> : > there a SolrIndexWriter? This way I would be able to modify the
> documents (I
> : > need to modify them depending on values of other documents, that's why
> I
> : > can't do it with DIH delta-import).
> : > Thanks in advance
> : >
> : >
> : > Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> : >>
> : >> On Tue, Jul 28, 2009 at 5:17 PM, Marc
> Sturlese
> : >> wrote:
> : >>>
> : >>> That really sounds the best way to reach my goal. How could I
> invoque a
> : >>> listener from the newSearcher?Would be something like:
> : >>>    
> : >>>      
> : >>>         solr 0  : >>> name="rows">10 
> : >>>         rocks 0
>  : >>> name="rows">10 
> : >>>        static newSearcher warming query from
> : >>> solrconfig.xml
> : >>>      
> : >>>    
> : >>>    
> : >>>
> : >>> And MyCustomListener would be the class who open the reader:
> : >>>
> : >>>        RefCounted searchHolder = null;
> : >>>        try {
> : >>>          searchHolder = dataImporter.getCore().getSearcher();
> : >>>          IndexReader reader = searchHolder.get().getReader();
> : >>>
> : >>>          //Here I iterate over the reader doing docuemnt
> modifications
> : >>>
> : >>>        } finally {
> : >>>           if (searchHolder != null) searchHolder.decref();
> : >>>        }
> : >>>

Re: update some index documents after indexing process is done with DIH

2009-07-31 Thread Marc Sturlese

: If you make your EventListener implements SolrCoreAware you can get
: hold of the core on inform. use that to get hold of the
: SolrIndexWriter 

Implementing SolrCoreAware I can get hold of the core and easy get hold of A
SolrIndexSearcher and so a reader. But I can't see the way to get hold of
SolrIndexWriter just holding core...



Marc Sturlese wrote:
> 
> Hey there,
> I would like to be able to do something like: After the indexing process
> is done with DIH I would like to open an indexreader, iterate over all
> docs, modify some of them depending on others and delete some others. I
> can easy do this directly coding with lucene but would like to know if
> there's a way to do it with Solr using SolrDocument or SolrInputDocument
> classes.
> I have thougth in using SolrJ or DIH listener onImportEnd but not sure if
> I can get an IndexReader in there.
> Any advice?
> Thanks in advance
> 

-- 
View this message in context: 
http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24755320.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Is negative boost possible?

2009-08-19 Thread Marc Sturlese


:>the only way to "negative boost" is to "positively boost" the inverse...
:>
:>  (*:* -field1:value_to_penalize)^10

This will do the job aswell as bq supports pure negative queries (at least
in trunk):
bq=-field1:value_to_penalize^10

http://wiki.apache.org/solr/SolrRelevancyFAQ#head-76e53db8c5fd31133dc3566318d1aad2bb23e07e


hossman wrote:
> 
> 
> : Use decimal figure less than 1, e.g. 0.5, to express less importance.
> 
> but that's stil la positive boost ... it still increases the scores of 
> documents that match.
> 
> the only way to "negative boost" is to "positively boost" the inverse...
> 
>   (*:* -field1:value_to_penalize)^10
> 
> : > I am looking for a way to assign negative boost to a term in Solr
> query.
> : > Our use scenario is that we want to boost matching documents that are
> : > updated recently and penalize those that have not been updated for a
> long
> : > time.  There are other terms in the query that would affect the scores
> as
> : > well.  For example we construct a query similar to this:
> : > 
> : > *:* field1:value1^2  field2:value2^2 lastUpdateTime:[NOW/DAY-90DAYS TO
> *]^5
> : > lastUpdateTime:[* TO NOW/DAY-365DAYS]^-3
> : > 
> : > I notice it's not possible to simply use a negative boosting factor in
> the
> : > query.  Is there any way to achieve such result?
> : > 
> : > Regards,
> : > Shi Quan He
> : > 
> : >   
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Is-negative-boost-possible--tp25025775p25039059.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Remove data from index

2009-08-20 Thread Marc Sturlese

As far as I know you can not do that with DIH. What size is your index?
Probably the best you can do is index from scratch again with full-import.

clico wrote:
> 
> I hope it could be a solution.
> 
> But I think I understood that u can use deletePkQuery like this
> 
> "select document_id from table_document where statusDeleted= 'Y'"
> 
> In my case I have no status like "statusDeleted".
> 
> The request I would like to write is
> 
> "Delete from my solr Index the id that are no longer present in my
> table_document"
> 
> With Lucene I had a way to do that : 
> open IndexReader,
> for each lucene document : check in table_document and remove in lucene
> index if document is no longer present in the table
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Remove-data-from-index-tp25063736p25063986.html
Sent from the Solr - User mailing list archive at Nabble.com.



Optimizing a query to sort results alphabetically for a determinated field

2009-08-24 Thread Marc Sturlese

Hey there, I need to sort my query results alphabetically for a determinated
field called "town". This field is analyzed with a KeywordAnalyzer and isn't
multiValued. Add that some docs doesn't doesn'h have this field.
Doing just:

http://localhost/solr//select/?q=whatever&version=2.2&start=0&rows=10&indent=on&sort=town
asc

Will give me back the results sorted alphabetically but will put the docs
that doesn't have this field (town) at the begining.
I want them at the end or I want them not to apear. This query solves the
problem:

http://localhost/solr//select/?q=whatever&version=2.2&start=0&rows=10&indent=on&sort=town
asc&fq=town:[a TO z]

But applying this filter: fq=town:[a TO z] is definitely not good in terms
of memory, speed and clauses...
Is there any way to do something similar but with a more optimized query?
Thanks in advance!

-- 
View this message in context: 
http://www.nabble.com/Optimizing-a-query-to-sort-results-alphabetically-for-a-determinated-field-tp25113379p25113379.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Optimizing a query to sort results alphabetically for a determinated field

2009-08-24 Thread Marc Sturlese

Yes but I thought it was just for sortable fields: sint,sfloat,sdouble,slong.
Can I apply "sortMissingLast"to  text fields analyzed with KeywordAnalyzer?

Constantijn Visinescu wrote:
> 
> There's a "sortMissingLast" true/false property that you can set on your
> fielType definitions in the schema.
> 
> On Mon, Aug 24, 2009 at 11:58 AM, Marc Sturlese
> wrote:
> 
>>
>> Hey there, I need to sort my query results alphabetically for a
>> determinated
>> field called "town". This field is analyzed with a KeywordAnalyzer and
>> isn't
>> multiValued. Add that some docs doesn't doesn'h have this field.
>> Doing just:
>>
>>
>> http://localhost/solr//select/?q=whatever&version=2.2&start=0&rows=10&indent=on&sort=town
>> asc
>>
>> Will give me back the results sorted alphabetically but will put the docs
>> that doesn't have this field (town) at the begining.
>> I want them at the end or I want them not to apear. This query solves the
>> problem:
>>
>>
>> http://localhost/solr//select/?q=whatever&version=2.2&start=0&rows=10&indent=on&sort=town
>> asc&fq=town:[a<http://localhost/solr//select/?q=whatever&version=2.2&start=0&rows=10&indent=on&sort=town%0Aasc&fq=town:%5Ba>TO
>> z]
>>
>> But applying this filter: fq=town:[a TO z] is definitely not good in
>> terms
>> of memory, speed and clauses...
>> Is there any way to do something similar but with a more optimized query?
>> Thanks in advance!
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Optimizing-a-query-to-sort-results-alphabetically-for-a-determinated-field-tp25113379p25113379.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Optimizing-a-query-to-sort-results-alphabetically-for-a-determinated-field-tp25113379p25113637.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Optimizing a query to sort results alphabetically for a determinated field

2009-08-24 Thread Marc Sturlese

It just worked. Thanks a lot! Good to know sortMissingLast works not just in
sortable fields

Constantijn Visinescu wrote:
> 
> not 100% sure  but the example schema has:
>  omitNorms="true"/>
> 
> So i'd say give it a go and see what happens ;)
> 
> On Mon, Aug 24, 2009 at 12:24 PM, Marc Sturlese
> wrote:
> 
>>
>> Yes but I thought it was just for sortable fields:
>> sint,sfloat,sdouble,slong.
>> Can I apply "sortMissingLast"to  text fields analyzed with
>> KeywordAnalyzer?
>>
>> Constantijn Visinescu wrote:
>> >
>> > There's a "sortMissingLast" true/false property that you can set on
>> your
>> > fielType definitions in the schema.
>> >
>> > On Mon, Aug 24, 2009 at 11:58 AM, Marc Sturlese
>> > wrote:
>> >
>> >>
>> >> Hey there, I need to sort my query results alphabetically for a
>> >> determinated
>> >> field called "town". This field is analyzed with a KeywordAnalyzer and
>> >> isn't
>> >> multiValued. Add that some docs doesn't doesn'h have this field.
>> >> Doing just:
>> >>
>> >>
>> >>
>> http://localhost/solr//select/?q=whatever&version=2.2&start=0&rows=10&indent=on&sort=town
>> >> asc
>> >>
>> >> Will give me back the results sorted alphabetically but will put the
>> docs
>> >> that doesn't have this field (town) at the begining.
>> >> I want them at the end or I want them not to apear. This query solves
>> the
>> >> problem:
>> >>
>> >>
>> >>
>> http://localhost/solr//select/?q=whatever&version=2.2&start=0&rows=10&indent=on&sort=town
>> >> asc&fq=town:[a<
>> http://localhost/solr//select/?q=whatever&version=2.2&start=0&rows=10&indent=on&sort=town%0Aasc&fq=town:%5Ba
>> >TO
>> >> z]
>> >>
>> >> But applying this filter: fq=town:[a TO z] is definitely not good in
>> >> terms
>> >> of memory, speed and clauses...
>> >> Is there any way to do something similar but with a more optimized
>> query?
>> >> Thanks in advance!
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/Optimizing-a-query-to-sort-results-alphabetically-for-a-determinated-field-tp25113379p25113379.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Optimizing-a-query-to-sort-results-alphabetically-for-a-determinated-field-tp25113379p25113637.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Optimizing-a-query-to-sort-results-alphabetically-for-a-determinated-field-tp25113379p25114941.html
Sent from the Solr - User mailing list archive at Nabble.com.



Best way to do a lucene matchAllDocs not using q.alt=*:*

2009-09-03 Thread Marc Sturlese

Hey there,
I need a query to get the total number of documents in my index. I can get
if I do this using DismaxRequestHandler:
q.alt=*:*&facet=false&hl=false&rows=0
I have noticed this query is very memory consuming. Is there any more
optimized way in trunk to get the total number of documents of my index?
Thanks in advanced

-- 
View this message in context: 
http://www.nabble.com/Best-way-to-do-a-lucene-matchAllDocs-not-using-q.alt%3D*%3A*-tp25277585p25277585.html
Sent from the Solr - User mailing list archive at Nabble.com.



DIH applying variosu transformers to a field

2009-09-08 Thread Marc Sturlese

Hey there, I am using DIH to import a db table and and have writed a custom
transformer following the example:
package foo;
public class CustomTransformer1{
public Object transformRow(Map row) {
String artist = row.get("artist");
if (artist != null) 
row.put("ar", artist.trim());

return row;
}
}
I'm  wondering if I write a second transformer and put it in data-config.xml
after CustomTransformer1. Will the input value of the row in the second
transformer be the result of the transformed row in the CustomTransfomer1 or
will be the original row value?
I would just need to index the result of transformer2 (whose input would be
the output of transformer1)

config woul look like:
https://issues.apache.org/jira/browse/SOLR-1033 ) but not sure if it's what
I ask for
Thanks in advance




-- 
View this message in context: 
http://www.nabble.com/DIH-applying-variosu-transformers-to-a-field-tp25342449p25342449.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Trunk Heap Space Issues

2009-10-01 Thread Marc Sturlese

Doing this you will send the dump where you want:
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/the/dump

Then you can open the dump with jhat:
jhat /path/to/the/dump/your_stack.bin

It provably will give you a OutOfMemortException due to teh large size ofthe
dump. In case you can give you more momory to your JVM do:

jhat -J-mx2000m my_stack.bin

Then you can analyze the heap at the OutOfMemoryMoment:

http://localhost:7000

Let me know if you find something please. I experienced the same a few ago
and could't fix the problem


Jeff Newburn wrote:
> 
> Added the parameter and it didn't seem to dump when it hit the gc limit
> error.  Any other thoughts?
> 
> -- 
> Jeff Newburn
> Software Engineer, Zappos.com
> jnewb...@zappos.com - 702-943-7562
> 
> 
>> From: Bill Au 
>> Reply-To: 
>> Date: Thu, 1 Oct 2009 12:16:53 -0400
>> To: 
>> Subject: Re: Solr Trunk Heap Space Issues
>> 
>> You probably want to add the following command line option to java to
>> produce a heap dump:
>> 
>> -XX:+HeapDumpOnOutOfMemoryError
>> 
>> Then you can use jhat to see what's taking up all the space in the heap.
>> 
>> Bill
>> 
>> On Thu, Oct 1, 2009 at 11:47 AM, Mark Miller 
>> wrote:
>> 
>>> Jeff Newburn wrote:
 I am trying to update to the newest version of solr from trunk as of
 May
 5th.  I updated and compiled from trunk as of yesterday (09/30/2009).
>>>  When
 I try to do a full import I am receiving a GC heap error after changing
 nothing in the configuration files.  Why would this happen in the most
 recent versions but not in the version from a few months ago.
>>> Good question. The error means its spending too much time trying to
>>> garbage collect without making much progress.
>>> Why so much more garbage to collect just by updating? Not sure...
>>> 
 The stack
 trace is below.
 
 Oct 1, 2009 8:34:32 AM
>>> org.apache.solr.update.processor.LogUpdateProcessor
 finish
 INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316,
>>> 167353,
 ...(83 more)]} 0 35991
 Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded
 at java.util.Arrays.copyOfRange(Arrays.java:3209)
 at java.lang.String.(String.java:215)
 at
 com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384)
 at
>>> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821)
 at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280)
 at
>>> org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
 at
 
>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt
 reamHandlerBase.java:54)
 at
 
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
 java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at
 
>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3
 38)
 at
 
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
 241)
 at
 
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
 FilterChain.java:235)
 at
 
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
 ain.java:206)
 at
 
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
 va:233)
 at
 
>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
 va:175)
 at
 
>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128
 )
 at
 
>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102
 )
 at
 
>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
 :109)
 at
 
>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
 at
 
>>> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:
 879)
 at
 
>>> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H
 ttp11NioProtocol.java:719)
 at
 
>>> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:
 2080)
 at
 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
 va:886)
 at
 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
 08)
 at java.lang.Thread.run(Thread.java:619)
 
 Oct 1, 2009 8:40:06 AM org.apache.solr.core.SolrCore execute
 INFO: [zeta-main] webapp=/solr path=/update params={} status=500
>>> QTime=5265
 Oct 1, 2009 8:40:12 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.OutOfMemoryEr

Re: Solr Trunk Heap Space Issues

2009-10-05 Thread Marc Sturlese

I think it doesn't make sense to enable warming if your solr instance is just
for indexing pourposes (it changes if you use it for search aswell). You
could comment the caches aswell from solrconfig.xml
Setting queryResultWindowSize and queryResultMaxDocsCached to sero maybe
could help... (but if caches and warming are removed from solrconfig.xml I
think these two parameters do nothing)

Jeffery Newburn wrote:
> 
> Ah yes we do have some warming queries which would look like a search. 
> Did
> that side change enough to push up the memory limits where we would run
> out
> like this?  Also, would FastLRU cache make a difference?
> -- 
> Jeff Newburn
> Software Engineer, Zappos.com
> jnewb...@zappos.com - 702-943-7562
> 
> 
>> From: Yonik Seeley 
>> Reply-To: 
>> Date: Fri, 2 Oct 2009 00:53:46 -0400
>> To: 
>> Subject: Re: Solr Trunk Heap Space Issues
>> 
>> On Thu, Oct 1, 2009 at 8:45 PM, Jeffery Newburn 
>> wrote:
>>> I loaded the jvm and started indexing. It is a test server so unless
>>> some
>>> errant query came in then no searching. Our instance has only 512mb but
>>> my
>>> concern is the obvious memory requirement leap since it worked before.
>>> What
>>> other data would be helpful with this?
>> 
>> Interesting... not too much should have changed for memory
>> requirements on the indexing side.
>> TokenStreams are now reused (and hence cached) per thread... but that
>> normally wouldn't amount to much.
>> 
>> There was recently another bug where compound file format was being
>> used regardless of the config settings... but I think that was fixed
>> on the 29th.
>> 
>> Maybe you were already close to the limit required?
>> Also, your heap dump did show LRUCache taking up 170MB, and only
>> searches populate that (perhaps you have warming searches configured
>> on this server?)
>> 
>> -Yonik
>> http://www.lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>>> 
>>> 
>>> On Oct 1, 2009, at 5:14 PM, "Mark Miller"  wrote:
>>> 
 Jeff Newburn wrote:
> 
> Ok I was able to get a heap dump from the GC Limit error.
> 
> 1 instance of LRUCache is taking 170mb
> 1 instance of SchemaIndex is taking 56Mb
> 4 instances of SynonymMap is taking 112mb
> 
> There is no searching going on during this index update process.
> 
> Any ideas what on earth is going on?  Like I said my May version did
> this
> without any problems whatsoever.
> 
> 
 Had any searching gone on though? Even if its not occurring during the
 indexing, you will still have the data structure loaded if searches had
 occurred.
 
 What heap size do you have - that doesn't look like much data to me ...
 
 --
 - Mark
 
 http://www.lucidimagination.com
 
 
 
>>> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-Trunk-Heap-Space-Issues-tp25701422p25752521.html
Sent from the Solr - User mailing list archive at Nabble.com.



SOLR-1395 integration with katta. Question about Katta's ranking among shards and IDF's

2009-10-09 Thread Marc Sturlese

Hey there,
I am trying to set up the Katta integration plugin. I would like to know if
Katta's ranking algorith is used when searching among shards. In case yes,
would it mean it solves the problem with IDF's of distributed Solr? 
-- 
View this message in context: 
http://www.nabble.com/SOLR-1395-integration-with-katta.-Question-about-Katta%27s-ranking-among-shards-and-IDF%27s-tp25819241p25819241.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: number of Solr indexes per Tomcat instance

2009-10-23 Thread Marc Sturlese

Are you using one single solr instance with multicore or multiple solr
instances with one index each?

Erik_l wrote:
> 
> Hi,
> 
> Currently we're running 10 Solr indexes inside a single Tomcat6 instance.
> In the near future we would like to add another 30-40 indexes to every
> Tomcat instance we host. What are the factors we have to take into account
> when planning for such deployments? Obviously we do know the sizes of the
> indexes but for example how much memory does Solr need to be allocated
> given that each index is treated as a webapp in Tomcat. Also, do you know
> if Tomcat has got a limit in number of apps that can be deployed (maybe I
> should ask this questions in a Tomcat forum). 
> 
> Thanks
> E
> 

-- 
View this message in context: 
http://www.nabble.com/number-of-Solr-indexes-per-Tomcat-instance-tp26027238p26027304.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: number of Solr indexes per Tomcat instance

2009-10-23 Thread Marc Sturlese

Probably multicore would give you better performance... I think most
important factors to take into account are the size of the index and the
traffic you have to hold. With enought RAM memory you can hold 40 cores in a
singe solr instance (or even more) but depending on the traffic you have to
hold you will suffer of slow response times.

Erik_l wrote:
> 
> We're not using multicore. Today, one Tomcat instance host a number of
> indexes in form of 10 Solr indexes (10 individual war files).
> 
> 
> Marc Sturlese wrote:
>> 
>> Are you using one single solr instance with multicore or multiple solr
>> instances with one index each?
>> 
>> Erik_l wrote:
>>> 
>>> Hi,
>>> 
>>> Currently we're running 10 Solr indexes inside a single Tomcat6
>>> instance. In the near future we would like to add another 30-40 indexes
>>> to every Tomcat instance we host. What are the factors we have to take
>>> into account when planning for such deployments? Obviously we do know
>>> the sizes of the indexes but for example how much memory does Solr need
>>> to be allocated given that each index is treated as a webapp in Tomcat.
>>> Also, do you know if Tomcat has got a limit in number of apps that can
>>> be deployed (maybe I should ask this questions in a Tomcat forum). 
>>> 
>>> Thanks
>>> E
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/number-of-Solr-indexes-per-Tomcat-instance-tp26027238p26028437.html
Sent from the Solr - User mailing list archive at Nabble.com.



keep index in production and snapshots in separate phisical disks

2009-10-23 Thread Marc Sturlese

Is there any way to make snapinstaller install the index in
spanpshot20091023124543 (for example) from another disk? I am asking this
because I would like not to optimize the index in the master (if I do that
it takes a long time to send it via rsync if it is so big). This way I would
just have to send the new segments.
In the slave I would have 2 phisical disks. Snappuller would send the
snapshot to a disk (here the index would not be optimized). Snapinstaller
would install the snapshot in the other disk, optimize it and open the
newIndexReader. The optimization should be done in the disk wich contains
the "not in production index" to not affect the search request speed.
Any idea what should I hack to reach this goal in case it is possible?
-- 
View this message in context: 
http://www.nabble.com/keep-index-in-production-and-snapshots-in-separate-phisical-disks-tp26029666p26029666.html
Sent from the Solr - User mailing list archive at Nabble.com.



distributed facet dates

2009-11-10 Thread Marc Sturlese

Hey there,
I am thinking to develope facet dates for distributed search but I don't
know exacly where to start. I am familiar with facet dates source code and I
think if I could undesertand how distributed facet queries work shouldn't be
that difficult.
I have read http://wiki.apache.org/solr/WritingDistributedSearchComponents
but I miss some info.
Could anyone point me how could I start?

Thanks in advance

-- 
View this message in context: 
http://old.nabble.com/distributed-facet-dates-tp26282343p26282343.html
Sent from the Solr - User mailing list archive at Nabble.com.



error with multicore CREATE action

2009-11-23 Thread Marc Sturlese

Hey there,
I am using Solr 1.4 out of the box and am trying to create a core at runtime
using the CREATE action.
I am getting this error when executing:
http://localhost:8983/solr/admin/cores?action=CREATE&name=x&instanceDir=x&persist=true&config=solrconfig.xml&schema=schema.xml&dataDir=data

Nov 23, 2009 6:18:44 PM org.apache.solr.core.SolrResourceLoader 
INFO: Solr home set to 'solr/x/'
Nov 23, 2009 6:18:44 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error executing default
implementation of CREATE
at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:250)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:111)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Caused by: java.lang.RuntimeException: Can't find resource 'solrconfig.xml'
in classpath or 'solr/x/conf/',
cwd=/home/smack/Desktop/apache-solr-1.4.0/example
at
org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:260)
at
org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:228)
at org.apache.solr.core.Config.(Config.java:101)
at org.apache.solr.core.SolrConfig.(SolrConfig.java:130)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:405)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:245)
... 21 more

I don't know if I am missing something. Should I create manually de folders
and schema and solconfig files?

-- 
View this message in context: 
http://old.nabble.com/error-with-multicore-CREATE-action-tp26482255p26482255.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr+jetty logging to syslog?

2009-11-26 Thread Marc Sturlese

With 1.4

-Add log4j jars to Solr

-Configure de SyslogAppender with something like:
log4j.appender.solrLog=org.apache.log4j.net.SyslogAppender
log4j.appender.solrLog.Facility=LOCAL0
log4j.appender.solrLog.SyslogHost=127.0.0.1
log4j.appender.solrLog.layout=org.apache.log4j.PatternLayout
log4j.appender.solrLog.layout.ConversionPattern=solr: %-4r [%t] %-5p %c -
%m%n

-Install syslog-ng and let syslog accept udp packets. To do that uncomment
in syslog-ng.conf the line
 udp();
in
# all known message sources
source s_all {





Otis Gospodnetic wrote:
> 
> Not many people do that, judging from
> http://www.google.com/search?&q=+solr%20+syslogd .
> 
> But I think this is really not a Solr-specific question.  Isn't the
> question really "how do I configure log4j to log to syslogd?".  Oh, and
> then "how do I configure slf4j to use log4j?"
> 
> The answer to the first one is "by using SyslogAppender" (google says so)
> The answer to the second one might be on
> http://fernandoribeiro.eti.br/2006/05/24/how-to-use-slf4j-with-log4j/
>  
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> 
> 
> 
> - Original Message 
>> From: Steve Conover 
>> To: solr-user@lucene.apache.org
>> Sent: Sat, November 21, 2009 4:09:57 PM
>> Subject: Re: solr+jetty logging to syslog?
>> 
>> Does no one send solr logging to syslog?
>> 
>> On Thu, Nov 19, 2009 at 5:54 PM, Steve Conover wrote:
>> > The solution involves slf4j to log4j to syslog (at least, for solr),
>> > but I'm having some trouble stringing all the parts together.  If
>> > anyone is doing this, would you mind posting how you use slf4j-log4j
>> > jar, what your log4j.properties looks like, what your java system
>> > properties settings are, and anything else you think is relevant?
>> >
>> > Much appreciated
>> >
>> > -Steve
>> >
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/solr%2Bjetty-logging-to-syslog--tp26437295p26531505.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Sanity check on numeric types and which of them to use

2009-12-05 Thread Marc Sturlese

And what about:

vs.


Wich is the differenece between both? It's just bcdint always better?
Thanks in advance


Yonik Seeley-2 wrote:
> 
> On Fri, Dec 4, 2009 at 7:38 PM, Jay Hill  wrote:
>> 1) Is there any benefit to using the "int" type as a TrieIntField w/
>> precisionStep=0 over the "pint" type for simple ints that won't be sorted
>> or
>> range queried?
> 
> No.  But given that people could throw in a random range query and
> have it work correctly with a trie based int (vs a plain int), seems
> reason enough to prefer it.
> 
>> 2) In 1.4, what type is now most efficient for sorting?
> 
> trie and plain should be pretty equivalent (trie might be slightly
> faster to uninvert the first time).  Both take up less memory in the
> field cache than sint.
> 
>> 3) The only reason to use a "sint" field is for backward compatibility
>> and/or to use sortMissingFirst/SortMissingLast, correct?
> 
> I believe so.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Sanity-check-on-numeric-types-and-which-of-them-to-use-tp26651725p26655009.html
Sent from the Solr - User mailing list archive at Nabble.com.



About fsv (sort field falues)

2009-12-08 Thread Marc Sturlese

I am tracing QueryComponent.java and would like to know the pourpose of doFSV
function. Don't understand what fsv are for.
Have tried some queries with fsv=true and some extra info apears in the
response:



But don't know what is it for and can't find much info out there. I read:
// The query cache doesn't currently store sort field values, and
SolrIndexSearcher doesn't
// currently have an option to return sort field values.  Because of
this, we
// take the documents given and re-derive the sort values.
Is it for cache pourposes?
Thanks in advance!

-- 
View this message in context: 
http://old.nabble.com/About-fsv-%28sort-field-falues%29-tp26700729p26700729.html
Sent from the Solr - User mailing list archive at Nabble.com.



UpdateRequestProcessor to avoid documents of being indexed

2009-12-10 Thread Marc Sturlese

Hey there,
I need that once a document has been created be able to decide if I want it
to be indexed or not. I have thought in implement an UpdateRequestProcessor
to do that but don't know how to tell Solr in the processAdd void to skip
the document.
If I delete all the field would it be skiped or is there a better way to
reach this goal?
Thanks in advance.
-- 
View this message in context: 
http://old.nabble.com/UpdateRequestProcessor-to-avoid-documents-of-being-indexed-tp26725534p26725534.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: UpdateRequestProcessor to avoid documents of being indexed

2009-12-10 Thread Marc Sturlese

Do you mean something like?:

@Override
public void processAdd(AddUpdateCommand cmd) throws IOException {
boolean addDocToIndex
=dealWithSolrDocFields(cmd.getSolrInputDocument()) ; 
if (next != null && addDocToIndex) {
next.processAdd(cmd);
} else {
 LOG.debug("Doc skipped!") ;
}
}

Thanks in advance



Chris Male wrote:
> 
> Hi,
> 
> If your UpdateRequestProcessor does not forward the AddUpdateCommand onto
> the RunUpdateProcessor, I believe the document will not be indexed.
> 
> Cheers
> 
> On Thu, Dec 10, 2009 at 12:09 PM, Marc Sturlese
> wrote:
> 
>>
>> Hey there,
>> I need that once a document has been created be able to decide if I want
>> it
>> to be indexed or not. I have thought in implement an
>> UpdateRequestProcessor
>> to do that but don't know how to tell Solr in the processAdd void to skip
>> the document.
>> If I delete all the field would it be skiped or is there a better way to
>> reach this goal?
>> Thanks in advance.
>> --
>> View this message in context:
>> http://old.nabble.com/UpdateRequestProcessor-to-avoid-documents-of-being-indexed-tp26725534p26725534.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Chris Male | Software Developer | JTeam BV.| T: +31-(0)6-14344438 |
> www.jteam.nl
> 
> 

-- 
View this message in context: 
http://old.nabble.com/UpdateRequestProcessor-to-avoid-documents-of-being-indexed-tp26725534p26725698.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: UpdateRequestProcessor to avoid documents of being indexed

2009-12-10 Thread Marc Sturlese

Yes, it did
Cheers

Chris Male wrote:
> 
> Hi,
> 
> Yeah thats what I was suggesting.  Did that work?
> 
> On Thu, Dec 10, 2009 at 12:24 PM, Marc Sturlese
> wrote:
> 
>>
>> Do you mean something like?:
>>
>>@Override
>>public void processAdd(AddUpdateCommand cmd) throws IOException {
>>boolean addDocToIndex
>> =dealWithSolrDocFields(cmd.getSolrInputDocument()) ;
>>if (next != null && addDocToIndex) {
>>next.processAdd(cmd);
>>} else {
>> LOG.debug("Doc skipped!") ;
>>}
>>}
>>
>> Thanks in advance
>>
>>
>>
>> Chris Male wrote:
>> >
>> > Hi,
>> >
>> > If your UpdateRequestProcessor does not forward the AddUpdateCommand
>> onto
>> > the RunUpdateProcessor, I believe the document will not be indexed.
>> >
>> > Cheers
>> >
>> > On Thu, Dec 10, 2009 at 12:09 PM, Marc Sturlese
>> > wrote:
>> >
>> >>
>> >> Hey there,
>> >> I need that once a document has been created be able to decide if I
>> want
>> >> it
>> >> to be indexed or not. I have thought in implement an
>> >> UpdateRequestProcessor
>> >> to do that but don't know how to tell Solr in the processAdd void to
>> skip
>> >> the document.
>> >> If I delete all the field would it be skiped or is there a better way
>> to
>> >> reach this goal?
>> >> Thanks in advance.
>> >> --
>> >> View this message in context:
>> >>
>> http://old.nabble.com/UpdateRequestProcessor-to-avoid-documents-of-being-indexed-tp26725534p26725534.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>> > --
>> > Chris Male | Software Developer | JTeam BV.| T: +31-(0)6-14344438 |
>> > www.jteam.nl
>> >
>> >
>>
>> --
>> View this message in context:
>> http://old.nabble.com/UpdateRequestProcessor-to-avoid-documents-of-being-indexed-tp26725534p26725698.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Chris Male | Software Developer | JTeam BV.| www.jteam.nl
> 
> 

-- 
View this message in context: 
http://old.nabble.com/UpdateRequestProcessor-to-avoid-documents-of-being-indexed-tp26725534p26726566.html
Sent from the Solr - User mailing list archive at Nabble.com.



tire fields and sortMissingLast

2009-12-21 Thread Marc Sturlese

Should sortMissingLast param be working on trie-fields?

-- 
View this message in context: 
http://old.nabble.com/tire-fields-and-sortMissingLast-tp26873134p26873134.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: suggestions for DIH batchSize

2009-12-23 Thread Marc Sturlese

If you want to retrieve a huge volume of rows you will end up with an
OutOfMemoryException due to the jdbc driver. Setting batchSize to -1 in your
data-config.xml (that internally will set it to Integer.MIN_VALUE) will make
the query to be executed in streaming, avoiding the memory exception.

Joel Nylund wrote:
> 
> Hi,
> 
> it looks like from looking at the code the default is 500, is the  
> recommended setting for this?
> 
> Has anyone notice any significant performance/memory tradeoffs by  
> making this much bigger?
> 
> thanks
> Joel
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/suggestions-for-DIH-batchSize-tp26894539p26897636.html
Sent from the Solr - User mailing list archive at Nabble.com.



Customize solr query

2008-11-02 Thread Marc Sturlese

Hey there,
I would like to customize my query this way:
 I want to give higher boosting to the results that match the query ejecuted
between comas and less boosting to the results matching without comas. I
have it done in my own lucene app (coding straight with lucene). I am trying
to migrate to Solr but don't know how to do this comas stuff. I wouls like
to do something like this:
...title:"+query_string+" (setting boosting 3) and title:+query_string+
(setting boosting 2)...
I supose I have to add something ti the solrconfig.xml but couldn't find
what.
Any advice?
Thanks in advanced

Marc Sturlese
-- 
View this message in context: 
http://www.nabble.com/Customize-solr-query-tp20293029p20293029.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Getting a document by primary key

2008-11-02 Thread Marc Sturlese

Hey there,
I am doing the same and I am experimenting some trouble. I get the document
data searching by term. The problem is that when I do it several times
(inside a huge for) the app starts increasing the memory use until I use
almost the whole memory...
Did u find any other way to do that?


Jonathan Ariel wrote:
> 
> I'm developing my own request handler and given a document primary key I
> would like to get it from the index.
> Which is the best and fastest way to do this? I will execute this request
> handler several times and this should work really fast.
> Sorry if it's a basic question.
> 
> Thanks!
> 
> Jonathan
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Getting-a-document-by-primary-key-tp20072108p20295436.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Getting a document by primary key

2008-11-03 Thread Marc Sturlese

Hey there,
I never run out of memory but I think the app always run to the limit... The
problem seems to be in here (searching by term):
try {
indexSearcher = new IndexSearcher(path_index) ;

QueryParser queryParser = new QueryParser("id_field",
getAnalyzer(stopWordsFile)) ;
Query query = queryParser.parse(query_string) ;

Hits hits = indexSearcher.search(query) ;

if(hits.length() > 0) {
doc = hits.doc(0) ;
}

} catch (Exception ex) {

} finally {
if(indexSearcher != null) {
try {
indexSearcher.close() ;
} catch(Exception e){} ;
indexSearcher = null ;
}
}

As hits is deprecated I tried to use termdocs and top docs... but the memory
problem never disapeared...
If I call the garbage collector every time I use the upper code the memory
doesn't increase undefinitely but... the app works soo slow.
Any suggestion?
Thanks for replaying!


Yonik Seeley wrote:
> 
> On Sun, Nov 2, 2008 at 8:09 PM, Marc Sturlese <[EMAIL PROTECTED]>
> wrote:
>> I am doing the same and I am experimenting some trouble. I get the
>> document
>> data searching by term. The problem is that when I do it several times
>> (inside a huge for) the app starts increasing the memory use until I use
>> almost the whole memory...
> 
> That just sounds like the way Java's garbage collection tends to
> work... do you ever run out of memory (and get an exception)?
> 
> -Yonik
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Getting-a-document-by-primary-key-tp20072108p20309245.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Getting a document by primary key

2008-11-03 Thread Marc Sturlese

Hey your are right,
I'm trying to migrate my app to solr. For the moment I am using solr for the
searching part of the app but i am using my own lucene app for indexing,
Shoud have posted in lucene forum for this trouble. Sorry about that.
Iam trying to use termdocs properly now.
Thanks for your advice.

Marc

Yonik Seeley wrote:
> 
> On Mon, Nov 3, 2008 at 2:49 PM, Otis Gospodnetic
> <[EMAIL PROTECTED]> wrote:
>> Is this your code or something from Solr?
>> That indexSearcher = new IndexSearcher(path_index) ; is very suspicious
>> looking.
> 
> Good point... if this is a Solr plugin, then get the SolrIndexSearcher
> from the request object.
> If it's not Solr, then use termenum/termdocs (and post to the right list
> ;-)
> 
> -Yonik
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Getting-a-document-by-primary-key-tp20072108p20310224.html
Sent from the Solr - User mailing list archive at Nabble.com.



Using DataImportHandler with mysql database

2008-11-10 Thread Marc Sturlese

Hey there, 
I am trying to use the DataImportHandler to index data from a mysql
database. I am having the same error all the time just when I start tomcat:

Nov 10, 2008 7:39:49 PM org.apache.solr.handler.dataimport.DataImporter
loadDataConfig
INFO: Data Configuration loaded successfully
Nov 10, 2008 7:39:49 PM org.apache.solr.handler.dataimport.DataImportHandler
inform
SEVERE: Exception while loading DataImporter
java.lang.NullPointerException
at
org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:95)
at
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
...

I am using the oficial release of solar 1.3. First I tried to add the
compiled DataImportHandler jar. As it didn't work what I did was:
I downloaded the package org/apache/solr/handler/dataimport from a nightly
build and have added it and compiled to my solr 1.3 oficial source release.
This way I have my solr1.3 release with the DataImporthandler

In solrconfig.xml I have created a request handler to make the import:




/path_to_/data-config.xml

  

To connect to the database , in data-config.xml I am doing: 

...and here I
do the select and the mapping db_field - index_field

*The mysql connector is correctly added in the classpath

I think I must be missing something in my configuration but can't find
what...
Anyone can give me a hand? I am a bit lost with this problem...
Thanks in advanced

Marc Sturlese



-- 
View this message in context: 
http://www.nabble.com/Using-DataImportHandler-with-mysql-database-tp20425791p20425791.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using DataImportHandler with mysql database

2008-11-11 Thread Marc Sturlese

That worked! I was writing in a bad way the  
> It seems like your data-config does not have any  tag. The
> following is the correct structure:
> 
> 
>   
> 
>   
> 
> 
> On Tue, Nov 11, 2008 at 12:31 AM, Marc Sturlese
> <[EMAIL PROTECTED]>wrote:
> 
>>
>> Hey there,
>> I am trying to use the DataImportHandler to index data from a mysql
>> database. I am having the same error all the time just when I start
>> tomcat:
>>
>> Nov 10, 2008 7:39:49 PM
>> org.apache.solr.handler.dataimport.DataImportHandler
>> processConfiguration
>> INFO: Processing configuration from solrconfig.xml:
>> {config=/path_to/data-config.xml}
>> Nov 10, 2008 7:39:49 PM org.apache.solr.handler.dataimport.DataImporter
>> loadDataConfig
>> INFO: Data Configuration loaded successfully
>> Nov 10, 2008 7:39:49 PM
>> org.apache.solr.handler.dataimport.DataImportHandler
>> inform
>> SEVERE: Exception while loading DataImporter
>> java.lang.NullPointerException
>>at
>>
>> org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:95)
>>at
>>
>> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
>> ...
>>
>> I am using the oficial release of solar 1.3. First I tried to add the
>> compiled DataImportHandler jar. As it didn't work what I did was:
>> I downloaded the package org/apache/solr/handler/dataimport from a
>> nightly
>> build and have added it and compiled to my solr 1.3 oficial source
>> release.
>> This way I have my solr1.3 release with the DataImporthandler
>>
>> In solrconfig.xml I have created a request handler to make the import:
>>
>> > class="org.apache.solr.handler.dataimport.DataImportHandler">
>>
>>
>>/path_to_/data-config.xml
>>
>>  
>>
>> To connect to the database , in data-config.xml I am doing:
>> 
>>   > url="jdbc:mysql://localhost/db_name" user="root" password=""/> ...and
>> here
>> I
>> do the select and the mapping db_field - index_field
>>
>> *The mysql connector is correctly added in the classpath
>>
>> I think I must be missing something in my configuration but can't find
>> what...
>> Anyone can give me a hand? I am a bit lost with this problem...
>> Thanks in advanced
>>
>> Marc Sturlese
>>
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Using-DataImportHandler-with-mysql-database-tp20425791p20425791.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Using-DataImportHandler-with-mysql-database-tp20425791p20435463.html
Sent from the Solr - User mailing list archive at Nabble.com.



deduplication & dataimporthandler

2008-11-11 Thread Marc Sturlese

Hey there,
Is there any way to use dataimporthandler with deduplication together just
doing xml configuration?

I have read that deduplication (http://wiki.apache.org/solr/Deduplication)
is meant to be used with the handler named /update (wich uses
solr.XmlUpdateRequestHandler class).

If there's no other way will go inside the DataImportHandler source but
would like to know if it can be done via conf...
I am thinking in something like adding:




true
field1,field2

  org.apache.solr.update.processor.TextProfileSignature

signatureField
 


  

Inside my requesthandler called /dataimport (wich uses
org.apache.solr.handler.dataimport.DataImportHandler class)

Has anyone done something similar?

Marc Sturlese

-- 
View this message in context: 
http://www.nabble.com/deduplication---dataimporthandler-tp20437553p20437553.html
Sent from the Solr - User mailing list archive at Nabble.com.



indexing data and deleting from index and database

2008-11-12 Thread Marc Sturlese

Hey there,
Since few weeks ago I am trying to migrate my lucene core app to Solr and
many questions are coming to my mind...
Before being in ApacheCon I thought that my Lucene Index works fine with my
Solr Search Engine but after my conversation with Erik in the Solr BootCamp
I understood that the structure of the Fields in the Solr Index are
different, specially in analyzing stuff.

Now, I want to use Solr to index too and I have some questions:
The first thing I do when I launch the indexer is to delete a lot of
documents that I have marked in a db with a field delete=1 that I have
indexed before in the Lucene Index.
Once it is done, I also delete the documents from the DB.
After that, I index some docs from the same DB (the 100.000 newest Docs and
some other modifieds).

To do the migration I have started using DataImportHandler  (with
JDBCDataSource) with Delta Import to add new documents. The thing is that I
can not find a way to delete the rows from the DB neither the docs from my
index with DataImportHandler.

Is to do an implementation of the DataSource the best way to do this task?
Is there a better way?

Thanks for everything!!!
-- 
View this message in context: 
http://www.nabble.com/indexing-data-and-deleting-from-index-and-database-tp20466411p20466411.html
Sent from the Solr - User mailing list archive at Nabble.com.



troubles with delta import

2008-11-14 Thread Marc Sturlese

Hey there, I am using dataimport with full-import successfully but there's no
way do make it work with delta-import. Aparently solr doesn't show any error
but it does not do what it is supose to.
I thing the problme is with dataimport.properties because it is never
updated. I have it placed in the same folder as solrconfig.xml and
schema.xml and the writing permissions are set propertly. What makes me
doubt is that couldn't find anywhere to tell solr the path of this file.
Don't know if solr is suposed to find it automatically.

My data-config.xml looks like this:









*I have in the rows of the table a timestamp field called dt_last_modified

Other thing that can't exactly understant is why i have to put the query and
delta-query... why just with deltaquery (with more fields in the select) is
not enough?

After the ejecution everything seems to go ok (even with the debug and
verbose mode) but no docs have changed and dataimport.properties is not
updated...

Any suggestion? Have done many tests but no way...

-- 
View this message in context: 
http://www.nabble.com/troubles-with-delta-import-tp20498449p20498449.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: troubles with delta import

2008-11-14 Thread Marc Sturlese

Hey Shalin,
I have tryied 2 methods:
1-First doing a full-import and after a delta-import
2.-Start directly with the delta-import.

In any of both cases the date of the import.properties file is updated. I
have it placed in the same folder as schema.xml,data-config.xml and
solrconfig.xml (is where I think is must be placed acording to what I
understand in the wiki). Is it correct?
Is the only thing that I think maybe I am missing...
Thanks in advance


Shalin Shekhar Mangar wrote:
> 
> Hi Marc,
> 
> Did you do a full-import first? If not, no value for last import time is
> written and the delta query may fail. We should fix this to use a sane
> default so that people do not need to full import first.
> 
> You need to put both because we support both full and delta, both of which
> need different kinds of queries and we cannot decide what you are going to
> use.
> 
> On Fri, Nov 14, 2008 at 4:35 PM, Marc Sturlese
> <[EMAIL PROTECTED]>wrote:
> 
>>
>> Hey there, I am using dataimport with full-import successfully but
>> there's
>> no
>> way do make it work with delta-import. Aparently solr doesn't show any
>> error
>> but it does not do what it is supose to.
>> I thing the problme is with dataimport.properties because it is never
>> updated. I have it placed in the same folder as solrconfig.xml and
>> schema.xml and the writing permissions are set propertly. What makes me
>> doubt is that couldn't find anywhere to tell solr the path of this file.
>> Don't know if solr is suposed to find it automatically.
>>
>> My data-config.xml looks like this:
>> 
>>> url="jdbc:mysql://path_db" user="user" password="pwd"/>
>>
>>
>>
>>
>>
>>
>> 
>> *I have in the rows of the table a timestamp field called
>> dt_last_modified
>>
>> Other thing that can't exactly understant is why i have to put the query
>> and
>> delta-query... why just with deltaquery (with more fields in the select)
>> is
>> not enough?
>>
>> After the ejecution everything seems to go ok (even with the debug and
>> verbose mode) but no docs have changed and dataimport.properties is not
>> updated...
>>
>> Any suggestion? Have done many tests but no way...
>>
>> --
>> View this message in context:
>> http://www.nabble.com/troubles-with-delta-import-tp20498449p20498449.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/troubles-with-delta-import-tp20498449p20500510.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: troubles with delta import

2008-11-14 Thread Marc Sturlese

Hey,
That's the weird thing... in the log everything seems to work fine:

Nov 14, 2008 3:12:46 PM org.apache.solr.handler.dataimport.DataImportHandler
processConfiguration
INFO: Processing configuration from solrconfig.xml:
{config=/opt/netbeans-5.5.1/enterprise3/apache-tomcat-5.5.17/bin/solr/conf/data-config.xml}
Nov 14, 2008 3:12:46 PM org.apache.solr.handler.dataimport.DataImporter
loadDataConfig
INFO: Data Configuration loaded successfully
Nov 14, 2008 3:12:46 PM org.apache.solr.handler.dataimport.DataImporter
doDeltaImport
INFO: Starting Delta Import
Nov 14, 2008 3:12:46 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Creating a connection for entity homes_tbl_ads with URL:
jdbc:mysql://localhost/path_db
Nov 14, 2008 3:12:46 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Time taken for getConnection(): 11
Nov 14, 2008 3:12:46 PM org.apache.solr.handler.dataimport.DocBuilder
execute
INFO: Time taken = 0:0:0.47
Nov 14, 2008 3:12:46 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr_web path=/dataimport
params={verbose=true&command=delta-import&debug=on} status=0 QTime=130 
Nov 14, 2008 3:12:46 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr_web path=/dataimport params={command=show-config}
status=0 QTime=0 

I am calling the dataimport this way:
http://...dataimport?command=full-import&debug=on&verbose=true
http://...dataimport?command=delta-import&debug=on&verbose=true

In delta-import I am getting this aoutput with the verbose debug:

...
delta-import
debug

...
lst name="statusMessages">
1
10
0
2008-11-14 15:12:46
0:0:0.47


It also shows the changes in the rows in the output of the verbose debug but
nothing change in the index when I check it with Luke.
I keep thinking that something is wrong coz the import.properties it is not
being created... but can't find why :(

solrconfig.xml:
 


/opt/netbeans-5.5.1/enterprise3/apache-tomcat-5.5.17/bin/solr/conf/data-config.xml

  

data-config.xml:











Thanks a lot



-- 
View this message in context: 
http://www.nabble.com/troubles-with-delta-import-tp20498449p20501450.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: troubles with delta import

2008-11-14 Thread Marc Sturlese

Hey Shalin!
Now at least I am getting some errors in the log file :D... Hope now I will
be able to find the problem.
Thanks for everything!


Shalin Shekhar Mangar wrote:
> 
> Ok I found the problem.
> 
> In debug mode, DataImportHandler does not commit documents since it is
> meant
> for debugging only. If you want to do a commit, add commit=true as a
> request
> parameter.
> 
> On Fri, Nov 14, 2008 at 7:56 PM, Marc Sturlese
> <[EMAIL PROTECTED]>wrote:
> 
>>
>> Hey,
>> That's the weird thing... in the log everything seems to work fine:
>>
>> Nov 14, 2008 3:12:46 PM
>> org.apache.solr.handler.dataimport.DataImportHandler
>> processConfiguration
>> INFO: Processing configuration from solrconfig.xml:
>>
>> {config=/opt/netbeans-5.5.1/enterprise3/apache-tomcat-5.5.17/bin/solr/conf/data-config.xml}
>> Nov 14, 2008 3:12:46 PM org.apache.solr.handler.dataimport.DataImporter
>> loadDataConfig
>> INFO: Data Configuration loaded successfully
>> Nov 14, 2008 3:12:46 PM org.apache.solr.handler.dataimport.DataImporter
>> doDeltaImport
>> INFO: Starting Delta Import
>> Nov 14, 2008 3:12:46 PM
>> org.apache.solr.handler.dataimport.JdbcDataSource$1
>> call
>> INFO: Creating a connection for entity homes_tbl_ads with URL:
>> jdbc:mysql://localhost/path_db
>> Nov 14, 2008 3:12:46 PM
>> org.apache.solr.handler.dataimport.JdbcDataSource$1
>> call
>> INFO: Time taken for getConnection(): 11
>> Nov 14, 2008 3:12:46 PM org.apache.solr.handler.dataimport.DocBuilder
>> execute
>> INFO: Time taken = 0:0:0.47
>> Nov 14, 2008 3:12:46 PM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr_web path=/dataimport
>> params={verbose=true&command=delta-import&debug=on} status=0 QTime=130
>> Nov 14, 2008 3:12:46 PM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr_web path=/dataimport params={command=show-config}
>> status=0 QTime=0
>>
>> I am calling the dataimport this way:
>> http://...dataimport?command=full-import&debug=on&verbose=true
>> http://...dataimport?command=delta-import&debug=on&verbose=true
>>
>> In delta-import I am getting this aoutput with the verbose debug:
>>
>> ...
>> delta-import
>> debug
>> 
>> ...
>> lst name="statusMessages">
>> 1
>> 10
>> 0
>> 2008-11-14 15:12:46
>> 0:0:0.47
>> 
>>
>> It also shows the changes in the rows in the output of the verbose debug
>> but
>> nothing change in the index when I check it with Luke.
>> I keep thinking that something is wrong coz the import.properties it is
>> not
>> being created... but can't find why :(
>>
>> solrconfig.xml:
>>  > class="org.apache.solr.handler.dataimport.DataImportHandler"
>> default="false">
>>
>>
>>>
>> name="config">/opt/netbeans-5.5.1/enterprise3/apache-tomcat-5.5.17/bin/solr/conf/data-config.xml
>>
>>  
>>
>> data-config.xml:
>>
>> 
>>> url="jdbc:mysql://localhost/trovit_es" user="root" password=""/>
>>
>>
>>
>>
>>
>>
>> 
>>
>> Thanks a lot
>>
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/troubles-with-delta-import-tp20498449p20501450.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/troubles-with-delta-import-tp20498449p20502269.html
Sent from the Solr - User mailing list archive at Nabble.com.



using deduplication with dataimporthandler

2008-11-17 Thread Marc Sturlese

Hey there,

I have posted before telling about my situation but I thing my explanation
was a bit confusing...
I am using dataImportHanlder and delta-import and it's working perfectly. I
have also coded my own SqlEntityProcesor to delete from the index and
database expired rows.

Now I need to do duplication control at indexing time. In my old lucene core
I made my own duplication control but it was so slow as it worked comparing
strings... I have been investigating solr deduplication
(http://wiki.apache.org/solr/Deduplication) and it seems so cool as it works
with hashes instead of strings.

I have learned how to use deduplication using the /update requestHandler as
the wiki says:
 

  dedupe

  

But the thing is that I want to use it with the /dataimport requestHanlder
(the one used by dataimporthandler). I don't know if there's a possible xml
configuration to add deduplication to dataimportHandler or I should code a
plugin... in that case, I don't exacly now where.

Hope my explanation is more clear now...
Thank's in advanced!


-- 
View this message in context: 
http://www.nabble.com/using-deduplication-with-dataimporthandler-tp20536053p20536053.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: using deduplication with dataimporthandler

2008-11-17 Thread Marc Sturlese

Thank you so much. I have it sorted.
I am wondering now if there is any more stable way to use deduplication than
adding to the solr source project this patch:
https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
(SOLR-799.patch 2008-11-12 05:10 PM this one exactly).

I have downloaded the last nightly-build source code and couldn't see the
needed classes in there.
Anyones knows something?Should I ask this in the developers forum?

Thanks in advanced


Marc Sturlese wrote:
> 
> Hey there,
> 
> I have posted before telling about my situation but I thing my explanation
> was a bit confusing...
> I am using dataImportHanlder and delta-import and it's working perfectly.
> I have also coded my own SqlEntityProcesor to delete from the index and
> database expired rows.
> 
> Now I need to do duplication control at indexing time. In my old lucene
> core I made my own duplication control but it was so slow as it worked
> comparing strings... I have been investigating solr deduplication
> (http://wiki.apache.org/solr/Deduplication) and it seems so cool as it
> works with hashes instead of strings.
> 
> I have learned how to use deduplication using the /update requestHandler
> as the wiki says:
>  
> 
>   dedupe
> 
>   
> 
> But the thing is that I want to use it with the /dataimport requestHanlder
> (the one used by dataimporthandler). I don't know if there's a possible
> xml configuration to add deduplication to dataimportHandler or I should
> code a plugin... in that case, I don't exacly now where.
> 
> Hope my explanation is more clear now...
> Thank's in advanced!
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/using-deduplication-with-dataimporthandler-tp20536053p20538008.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: using deduplication with dataimporthandler

2008-11-17 Thread Marc Sturlese



Marc Sturlese wrote:
> 
> Thank you so much. I have it sorted.
> I am wondering now if there is any more stable way to use deduplication
> than adding to the solr source project this patch:
> https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> (SOLR-799.patch   2008-11-12 05:10 PM this one exactly).
> 
> I have downloaded the last nightly-build source code and couldn't see the
> needed classes in there.
> Anyones knows something?Should I ask this in the developers forum?
> 
> The thing is I can't find the class
> org.apache.solr.update.processor.DeduplicateUpdateProcessorFactory
> anywhere...
> 
> Thanks in advanced
> 
> 
-- 
View this message in context: 
http://www.nabble.com/using-deduplication-with-dataimporthandler-tp20536053p20538077.html
Sent from the Solr - User mailing list archive at Nabble.com.



TextProfileSigature using deduplication

2008-11-18 Thread Marc Sturlese

Hey there, I've been testing and checking the source of the
TextProfileSignature.java to avoid similar entries at indexing time.
What I understood is that it is useful for huge text where the frequency of
the tokens (the words in lowercase just with number and leters in taht case)
is important. If you want to detect duplicates in not huge text and not
giving a lot of importance to the frequencies it doesn't work...
The hash will be made just with the terms wich frequency is higher than a
QUANTUM (which value is given in function of the max freq between all the
terms). So it will say that:

aaa sss ddd fff ggg hhh aaa kkk lll ooo
aaa xxx iii www qqq aaa jjj eee zzz nnn

are duplicates because quantum here wolud be 2 and the frequency of aaa
would be 2 aswell. So, to make the hash just the term aaa would be used.

In this case:
aaa sss ddd fff ggg hhh kkk lll ooo
apa sss ddd fff ggg hhh kkk lll ooo

Here quantum would be 1 and the frequencies of all terms would be 1 so all
terms would be use for the hash. It will consider this two strings not
similar.

As I understood the algorithm there's no way to make it understand that in
my second case both strings are similar. I wish i were wrong...

I have my own duplication system to detect that but I use String comparison
so it works really slow... Would like to know if there is any tuning
possibility to do that with TextProfileSignature 

Don't know if I should pot this here or in the developers forum...

Thanks in advance
-- 
View this message in context: 
http://www.nabble.com/TextProfileSigature-using-deduplication-tp20559155p20559155.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: TextProfileSigature using deduplication

2008-11-18 Thread Marc Sturlese

>>
>> I have my own duplication system to detect that but I use String 
>> comparison
>> so it works really slow...
>>  
What are you doing for the String comparison? Not exact right?

hey,
My comparison method looks for similar (not just exact)... what I do is to
compare two text word to word. What I do after is decide a % of similarity,
fore example:
aaa sss ddd fff ggg hhh jjj kkk lll ooo
bbb rrr ddd fff ggg hhh jjj kkk lll ooo

Deciding a 80% of similarity and comparing word to word these two String
would be similar. (I split texts in tokens and count how many similars I do
have). 
(I use some stopwords and rules aswell)

I am going to try more tunning in the parameters of TextProfileSignature as
you say.
Don't know if you remember but I ask you about this in the ApacheConn and
you told me abou this 799 JIRA. If i make it word it is definitely much
faster than my system...

Abou deduplication... I couldn't find anywhere the classe tha aperas in the
wiki :org.apache.solr.update.processor.DeduplicateUpdateProcessorFactory
so I downloaded the patch and pluedg in to my solr source (I use
org.apache.solr.update.processor.TextProfileSignature insted of the one
writed in the wiki). 

Would apreciate any advice about the tuning params of TextProfileSignature

Thank you for your time



markrmiller wrote:
> 
> 
>>>
>>> I have my own duplication system to detect that but I use String 
>>> comparison
>>> so it works really slow...
>>>  
> What are you doing for the String comparison? Not exact right?
> 
> 
-- 
View this message in context: 
http://www.nabble.com/TextProfileSigature-using-deduplication-tp20559155p20560828.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: TextProfileSigature using deduplication

2008-11-20 Thread Marc Sturlese

Hey there,
I found couple of solutions that work fine  for my case (is not exacly what
I was looking for at the begining but I could adapt it).

First one:
Use always quantum=1 and minTokenLen=2.
Instead of order the tokens by frequency, I order them alphabetically, doing
this I am a little more permissive (there will be more duplicates that
ordering by freq).
Use minTokenLen=3 could be usefull in here, depending on the use case

Second one
Oreder tokens alphabetically
Use minToken=2 quantum 2 for all fields less one. In this one I use
quantum=1. This filed that uses quantum=1 will make me be more restrictive
but I know that if it doesn't match the docs can't be considered as
duplicated.

These are 2 ways to detect duplicates in not huge texts. It is specific for
my case but the idea of giving different quantum to different fields could
be helpful in other cases aswell





 

Ken Krugler wrote:
> 
>>Marc Sturlese wrote:
>>>Hey there, I've been testing and checking the source of the
>>>TextProfileSignature.java to avoid similar entries at indexing time.
>>>What I understood is that it is useful for huge text where the frequency
of
>>>the tokens (the words in lowercase just with number and leters in taht
case)
>>>is important. If you want to detect duplicates in not huge text and not
>>>giving a lot of importance to the frequencies it doesn't work...
>>>The hash will be made just with the terms wich frequency is higher than a
>>>QUANTUM (which value is given in function of the max freq between all the
>>>terms). So it will say that:
>>>
>>>aaa sss ddd fff ggg hhh aaa kkk lll ooo
>>>aaa xxx iii www qqq aaa jjj eee zzz nnn
>>>
>>>are duplicates because quantum here wolud be 2 and the frequency of aaa
>>>would be 2 aswell. So, to make the hash just the term aaa would be used.
>>>
>>>In this case:
>>>aaa sss ddd fff ggg hhh kkk lll ooo
>>>apa sss ddd fff ggg hhh kkk lll ooo
>>>
>>>Here quantum would be 1 and the frequencies of all terms would be 1 so
all
>>>terms would be use for the hash. It will consider this two strings not
>>>similar.
>>>
>>>As I understood the algorithm there's no way to make it understand that
in
>>>my second case both strings are similar. I wish i were wrong...
>>>
>>>I have my own duplication system to detect that but I use String
comparison
>>>so it works really slow... Would like to know if there is any tuning
>>>possibility to do that with TextProfileSignature
>>>Don't know if I should pot this here or in the developers forum...
>>
>>Hi Marc,
>>
>>TextProfileSignature is a rather crude 
>>implementation of approximate similarity, and as 
>>you pointed out it's best suited for large 
>>texts. The original purpose of this Signature 
>>was to deduplicate web pages in large amounts of 
>>crawled pages (in Nutch), where it worked 
>>reasonably well. Its advantage is also that it's 
>>easy to compute and doesn't require multiple 
>>passes over the corpus.
>>
>>As it is implemented now, it breaks badly in the 
>>case you describe. You could modify this 
>>implementation to include also word-level 
>>ngrams, i.e. sequences of more than 1 word, up 
>>to N (e.g. 5) - this should work in your case.
>>
>>Ultimately, what you are probably looking for is 
>>a shingle-based algorithm, but it's relatively 
>>costly and requires multiple passes.
> 
> There's an intermediate approach we use...
> 
> * Generate separate hashes for each of the quantized bands
> * Create additional fingerprint values (depends on the nature of the data)
> * Find potentially similar files using the above
> * Then apply an accurate but slower comparison to determine true
> similarity
> 
> From our data, it's common to get files where 
> (due to small text changes) the frequency of a 
> term moves between quantized bands. This then 
> changes the über hash that you get from combining 
> all terms, but with 10 or so bands we still get 
> some matches on the hashes from the individual 
> bands.
> 
> The "find potentially similar files" uses a 
> simple Lucene scoring function, based on the 
> number of matching fingerprint values.
> 
> -- Ken
> --
> Ken Krugler
> Krugle, Inc.
> +1 530-210-6378
> "If you can't find it, you can't fix it"
> 
> 

-- 
View this message in context: 
http://www.nabble.com/TextProfileSigature-using-deduplication-tp20559155p20600118.html
Sent from the Solr - User mailing list archive at Nabble.com.



not string or text fields and shards

2008-11-20 Thread Marc Sturlese

Hey there,

I have started working with an index divided in 3 shards. When I did a
distributed search I got an error with the fields that were not string or
text. I read that the error was due to BinaryResponseWriter and not
string/text empty fields.
I found the solution in an old thread of this forum:
http://www.nabble.com/best-way-to-debug-shard-format-errors-td19087854.html

The thing is I had to change some source code and rebuild solr. In that old
thread said that this problem would be solved in Solr 1.3. It is the version
that I am using but I still found the problem. Maybe there is a solution not
adding source that I couldn't know.

Does someone found any other solution?

Thanks in advance.
-- 
View this message in context: 
http://www.nabble.com/not-string-or-text-fields-and-shards-tp20600353p20600353.html
Sent from the Solr - User mailing list archive at Nabble.com.



idea about faceting

2008-11-22 Thread Marc Sturlese

Hey there,

I am faceing a problem doing filed facets and I don't know if there exist
any solution in Solr to solve my problem.
I want to do facets with a field that is very small text. To do that I am
using the KeywordTokenizerfactory to keep all the words of the text in just
one token. I use LowerCaseFilterFactory not to miss cases that doesn't match
due to uppercase and ISOLatin1AccentFilterFactory not to miss cases that
doesn't match because of the accents.

The problem apears here, I would like to show the facets with accents or
uppercase.

In my old Lucene system not using Solr I use to create my facet fields with
accents but at searching time I removed the accents and uppercases manually
with java. So, i did the search without accents and upper case but I was
able to show them later.

I have been playing with the facet solr source code but can't find the way
to solve my problem...

Does anyone have an idea about how could I reach this goal?

Thanks in advance

-- 
View this message in context: 
http://www.nabble.com/idea-about-faceting-tp20638850p20638850.html
Sent from the Solr - User mailing list archive at Nabble.com.



data import handler - going deeper...

2008-11-28 Thread Marc Sturlese

Hey there,

After developing my own extends classes from sqlentityprocesor,
jdbcdatasource and transformer I have my customized dataimporthandler almost
working.
I have to reach one more goal.

In one hand I don't always have to index all the fields from my db row. For
example fields from db that have null value don't have to be indexed.
Checking the source code I see I could do that modifying the function
addFields from the DocBuilder.java

In the other hand i need  to give boost to fields from a doc at indexing 
time (set boost not to the whole doc but to a few fields). I see I can do
that in the addFieldValue function of the DocBuilder.java.

The thing is I would like not to modify core classes but do some plug in.
Is there any way to apply those changes using plugins like I did with
transformers or entityprocesors?

Thanks in advance
-- 
View this message in context: 
http://www.nabble.com/data-import-handler---going-deeper...-tp20731715p20731715.html
Sent from the Solr - User mailing list archive at Nabble.com.



  1   2   3   >