SOLR deduplication

2011-01-26 Thread Jason Brown
Hi - I have the SOLR deduplication configured and working well.

Is there any way I can tell which documents have been not added to the index as 
a result of the deduplication rejecting subsequent identical documents?

Many Thanks

Jason Brown.

If you wish to view the St. James's Place email disclaimer, please use the link 
below

http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer


Re: SOLR deduplication

2011-01-26 Thread Markus Jelsma
Not right now:
https://issues.apache.org/jira/browse/SOLR-1909

> Hi - I have the SOLR deduplication configured and working well.
> 
> Is there any way I can tell which documents have been not added to the
> index as a result of the deduplication rejecting subsequent identical
> documents?
> 
> Many Thanks
> 
> Jason Brown.
> 
> If you wish to view the St. James's Place email disclaimer, please use the
> link below
> 
> http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer


Re: Use terracotta bigmemory for solr-caches

2011-01-26 Thread Martin Grotzke
On Tue, Jan 25, 2011 at 4:19 PM, Em  wrote:

>
> Hi Martin,
>
> are you sure that your GC is well tuned?
>
This are the heap related jvm configurations for the servers running with
17GB heap size (one with parallel collector, one with CMS):

-XX:+HeapDumpOnOutOfMemoryError -server -Xmx17G -XX:MaxPermSize=256m
-XX:NewSize=2G -XX:MaxNewSize=2G -XX:SurvivorRatio=6
-XX:+UseConcMarkSweepGC

-XX:+HeapDumpOnOutOfMemoryError -server -Xmx17G -XX:MaxPermSize=256m
-XX:NewSize=2G -XX:MaxNewSize=2G -XX:SurvivorRatio=6 -XX:+UseParallelOldGC
-XX:+UseParallelGC

Another heap configuration is running with 8GB max heap, and this search
server also has lower peaks in response times.

To me it seems that it's just too much memory that gets
allocated/collected/compacted. I'm just checking out how far we can reduce
cache sizes (and the max heap) without any reduction of response times (and
disk I/O). Right now it seems that a reduction of the documentCache size
indeed does reduce the hitratio of the cache, but it does not have any
negative impact on response times (neither is I/O increased). Therefore I'd
follow the path of reducing the cache sizes as far as we can as long as
there are no negative impacts and then I'd check again the longest requests
and see if they're still caused by full GC cycles. Even then they should be
much shorter due to the reduced memory that is collected/compacted.

So now I also think, the terracotta bigmemory is not the right solution :-)

Cheers,
Martin



> A request that needs more than a minute isn't the standard, even when I
> consider all the other postings about response-performance...
>
> Regards
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Use-terracotta-bigmemory-for-solr-caches-tp2328257p2330652.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Martin Grotzke
http://www.javakaffee.de/blog/


Re: Weird behaviour with phrase queries

2011-01-26 Thread Jerome Renard
Hi Erick,

On Tue, Jan 25, 2011 at 1:38 PM, Erick Erickson wrote:

> Frankly, this puzzles me. It *looks* like it should be OK. One warning, the
> analysis page sometimes is a bit misleading, so beware of that.
>
> But the output of your queries make it look like the query is parsing as
> you
> expect, which leaves the question of whether your index contains what
> you think it does. You might get a copy of Luke, which allows you to
> examine
> what's actually in your index instead of what you think is in there.
> Sometimes
> there are surprises here!
>
>
Bingo ! Some data were not in the index. Indexing them obviously fixed the
problem.


> I didn't mean to re-index your whole corpus, I was thinking that you could
> just index a few documents in a test index so you have something small to
> look at.
>
> Sorry I can't spot what's happening right away.
>
>
No worries, thanks for your support :)

-- 
Jérôme


Display analyzed values in hitlist

2011-01-26 Thread Martin Rödig
Hi,
I want to display author names in the hitlist. For the author metafield I 
created a new Fieldtype type_author wich includes a synonymlist. In the 
Synonymlist all posible names of a person are reduced to one name.
Example:
d...@hh.de, dietmar, brock, dietmar brock, db => Dietmar 
Brock

So far everything works fine as expected and in the facet auf the Field author 
displays only Dietmar Brock. But the hitlist still displays the orginal values 
like db or dietmar...
I known that Dietmar Brock only was stored in the Index and that the orginal 
String that comes in can be stored wich stored="true", but I need a posibility 
to display Dietmar Brock at the Hitlist.

Is there a posibility to include regexp and synonymlists in the requesthandler 
or to stored the "analyzed" value? What is the best solution to do it?
Must I write a own Requesthandler?

Thanks
Martin


Display analyzed values in hitlist

2011-01-26 Thread Martin Rödig
Hi,
I want to display author names in the hitlist. For the author metafield I 
created a new Fieldtype type_author wich includes a synonymlist. In the 
Synonymlist all posible names of a person are reduced to one name.
Example:
d...@hh.de, dietmar, brock, dietmar brock, db => Dietmar 
Brock

So far everything works fine as expected and in the facet auf the Field author 
displays only Dietmar Brock. But the hitlist still displays the orginal values 
like db or dietmar...
I known that Dietmar Brock only was stored in the Index and that the orginal 
String that comes in can be stored wich stored="true", but I need a posibility 
to display Dietmar Brock at the Hitlist.

Is there a posibility to include regexp and synonymlists in the requesthandler 
or to stored the "analyzed" value? What is the best solution to do it?
Must I write a own Requesthandler?

Thanks
Martin



Re: please help >>Problem with dataImportHandler

2011-01-26 Thread Ezequiel Calderara
And the answer there didn't help?
Why do not copy the logs of this new error too?


Every time you encounter an error, take time to send the log output, and if
its needed the schema.xml or the solrconfig.xml

Thanks


On Tue, Jan 25, 2011 at 6:44 AM, Dinesh wrote:

>
>
> http://lucene.472066.n3.nabble.com/Getting-started-with-writing-parser-tp2278092p2327738.html
>
> this thread explains my problem
>
> -
> DINESHKUMAR . M
> I am neither especially clever nor especially gifted. I am only very, very
> curious.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2327745.html
>  Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
__
Ezequiel.

Http://www.ironicnet.com


Re: Highlighting with/without Term Vectors

2011-01-26 Thread Grant Ingersoll

On Jan 24, 2011, at 2:42 PM, Salman Akram wrote:

> Hi,
> 
> Does anyone have any benchmarks how much highlighting speeds up with Term
> Vectors (compared to without it)? e.g. if highlighting on 20 documents take
> 1 sec with Term Vectors any idea how long it will take without them?
> 
> I need to know since the index used for highlighting has a TVF file of
> around 450GB (approx 65% of total index size) so I am trying to see whether
> the decreasing the index size by dropping TVF would be more helpful for
> performance (less RAM, should be good for I/O too I guess) or keeping it is
> still better?
> 
> I know the best way is try it out but indexing takes a very long time so
> trying to see whether its even worthy or not.


Try testing on a smaller set.  In general, you are saving the process of 
re-analyzing the content, so, to some extent it is going to be dependent on how 
fast your analyzer chain is.  At the size you are at, I don't know if storing 
TVs is worth it.

Re: How to Configure Solr to pick my lucene custom filter

2011-01-26 Thread Valiveti

I am new to using Solr and lucene.

I wrote a custom filter. 

The logic is build based on a multi field value of the document found. Only
the documents that the user has read access should be returned back.

I would like this custom filter to be used during search and filter out the
documnets.

Is there a way that this filter can be configured to the default search
handlers to be picked during search?

I havent written any analyzers or tokenfilters. Are they needed for this
scenario?

Thanks,
Valiveti
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-Configure-Solr-to-pick-my-lucene-custom-filter-tp2331928p2354772.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to edit / compile the SOLR source code

2011-01-26 Thread Anurag

Actually i also want to edit Source Files of Solr.Does that mean i will have
to go in "Src" directory of Solr and then rebuild using ant? I need not
compile them or Ant will  do the whole compiling as well as updating the jar
files?
i have the following files in Solr-1.3.0 directory

/home/anurag/apache-solr-1.3.0/build
/home/anurag/apache-solr-1.3.0/client
/home/anurag/apache-solr-1.3.0/contrib
/home/anurag/apache-solr-1.3.0/dist
/home/anurag/apache-solr-1.3.0/docs
/home/anurag/apache-solr-1.3.0/example
/home/anurag/apache-solr-1.3.0/lib
/home/anurag/apache-solr-1.3.0/src
/home/anurag/apache-solr-1.3.0/build.xml
/home/anurag/apache-solr-1.3.0/CHANGES.txt
/home/anurag/apache-solr-1.3.0/common-build.xml
/home/anurag/apache-solr-1.3.0/KEYS.txt
/home/anurag/apache-solr-1.3.0/LICENSE.txt
/home/anurag/apache-solr-1.3.0/NOTICE.txt
/home/anurag/apache-solr-1.3.0/README.txt

and i want to edit the source code to implement my things. How should i
proceed?

-
Kumar Anurag

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-edit-compile-the-SOLR-source-code-tp477584p2355270.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrDocumentList Size vs NumFound

2011-01-26 Thread Bing Li
Dear all,

I got a weird problem. The number of searched documents is much more than
10. However, the size of SolrDocumentList is 10 and the getNumFound() is the
exact count of results. When I need to iterate the results as follows, only
10 are displayed. How to get the rest ones?

..
for (SolrDocument doc : docs)
{

System.out.println(doc.getFieldValue(Fields.CATEGORIZED_HUB_TITLE_FIELD) +
": " + doc.getFieldValue(Fields.CATEGORIZED_HUB_URL_FIELD) + "; " +
doc.getFieldValue(Fields.HUB_CATEGORY_NAME_FIELD) + "/" +
doc.getFieldValue(Fields.HUB_PARENT_CATEGORY_NAME_FIELD));
}
..

Could you give me a hand?

Thanks,
LB


Re: SolrDocumentList Size vs NumFound

2011-01-26 Thread Markus Jelsma
Hi,

If your query yields 1000 documents and the rows parameter is 10 then you'll 
get only 10 documents.  Consult the wiki on the start and rows parameters:

http://wiki.apache.org/solr/CommonQueryParameters

Cheers.

> Dear all,
> 
> I got a weird problem. The number of searched documents is much more than
> 10. However, the size of SolrDocumentList is 10 and the getNumFound() is
> the exact count of results. When I need to iterate the results as follows,
> only 10 are displayed. How to get the rest ones?
> 
> ..
> for (SolrDocument doc : docs)
> {
> 
> System.out.println(doc.getFieldValue(Fields.CATEGORIZED_HUB_TITLE_FIELD) +
> ": " + doc.getFieldValue(Fields.CATEGORIZED_HUB_URL_FIELD) + "; " +
> doc.getFieldValue(Fields.HUB_CATEGORY_NAME_FIELD) + "/" +
> doc.getFieldValue(Fields.HUB_PARENT_CATEGORY_NAME_FIELD));
> }
> ..
> 
> Could you give me a hand?
> 
> Thanks,
> LB


Re: How to Configure Solr to pick my lucene custom filter

2011-01-26 Thread Erick Erickson
Ah, ok. We were talking about different things. Filters is kind of
overloaded in Solr/lucene, it's easy to be confused.

No, you do not have to deal with analyzers or tokenfilters in your scenario.

But let's back up a bit here. How are permissions for documents stored?
Because if there's an identifier in the document indicating the user has
access (say a role-based permission scheme with a relatively small
number of auth tokens, say less than a few hundred), you don't need
to write any custom code at all, simply append &fq=auth_token_field:(tok1
tok2, tok3)
to the query.

This isn't a good solution if the users have permissions set on individual
documents though since the fq clause could potentially be very, very, very
large.

So let's delve into the problem before deciding on a solution, this may be
an XY problem.

Best
Erick

On Wed, Jan 26, 2011 at 10:48 AM, Valiveti wrote:

>
> I am new to using Solr and lucene.
>
> I wrote a custom filter.
>
> The logic is build based on a multi field value of the document found. Only
> the documents that the user has read access should be returned back.
>
> I would like this custom filter to be used during search and filter out the
> documnets.
>
> Is there a way that this filter can be configured to the default search
> handlers to be picked during search?
>
> I havent written any analyzers or tokenfilters. Are they needed for this
> scenario?
>
> Thanks,
> Valiveti
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-Configure-Solr-to-pick-my-lucene-custom-filter-tp2331928p2354772.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How to edit / compile the SOLR source code

2011-01-26 Thread Erick Erickson
Sure, at the top level (above src) you should be able to just type
"ant dist", then look in the "dist" directory ant there should be a
solr.war

Best
Erick

On Wed, Jan 26, 2011 at 11:43 AM, Anurag  wrote:

>
> Actually i also want to edit Source Files of Solr.Does that mean i will
> have
> to go in "Src" directory of Solr and then rebuild using ant? I need not
> compile them or Ant will  do the whole compiling as well as updating the
> jar
> files?
> i have the following files in Solr-1.3.0 directory
>
> /home/anurag/apache-solr-1.3.0/build
> /home/anurag/apache-solr-1.3.0/client
> /home/anurag/apache-solr-1.3.0/contrib
> /home/anurag/apache-solr-1.3.0/dist
> /home/anurag/apache-solr-1.3.0/docs
> /home/anurag/apache-solr-1.3.0/example
> /home/anurag/apache-solr-1.3.0/lib
> /home/anurag/apache-solr-1.3.0/src
> /home/anurag/apache-solr-1.3.0/build.xml
> /home/anurag/apache-solr-1.3.0/CHANGES.txt
> /home/anurag/apache-solr-1.3.0/common-build.xml
> /home/anurag/apache-solr-1.3.0/KEYS.txt
> /home/anurag/apache-solr-1.3.0/LICENSE.txt
> /home/anurag/apache-solr-1.3.0/NOTICE.txt
> /home/anurag/apache-solr-1.3.0/README.txt
>
> and i want to edit the source code to implement my things. How should i
> proceed?
>
> -
> Kumar Anurag
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-edit-compile-the-SOLR-source-code-tp477584p2355270.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How to edit / compile the SOLR source code

2011-01-26 Thread Jonathan Rochkind
[Btw, this is great, thank you so much to Solr devs for providing simple 
ant-based compilation, and not making me install specific development 
tools and/or figure out how to use maven to compile, like certain other 
java projects. Just make sure ant is installed and 'ant dist', I can do 
that!  I more or less know how to write Java, at least for simple 
things,  but I still have trouble getting the right brew of required 
Java dev tools working properly to compile some projects! ]


On 1/26/2011 4:19 PM, Erick Erickson wrote:

Sure, at the top level (above src) you should be able to just type
"ant dist", then look in the "dist" directory ant there should be a
solr.war

Best
Erick

On Wed, Jan 26, 2011 at 11:43 AM, Anurag  wrote:


Actually i also want to edit Source Files of Solr.Does that mean i will
have
to go in "Src" directory of Solr and then rebuild using ant? I need not
compile them or Ant will  do the whole compiling as well as updating the
jar
files?
i have the following files in Solr-1.3.0 directory

/home/anurag/apache-solr-1.3.0/build
/home/anurag/apache-solr-1.3.0/client
/home/anurag/apache-solr-1.3.0/contrib
/home/anurag/apache-solr-1.3.0/dist
/home/anurag/apache-solr-1.3.0/docs
/home/anurag/apache-solr-1.3.0/example
/home/anurag/apache-solr-1.3.0/lib
/home/anurag/apache-solr-1.3.0/src
/home/anurag/apache-solr-1.3.0/build.xml
/home/anurag/apache-solr-1.3.0/CHANGES.txt
/home/anurag/apache-solr-1.3.0/common-build.xml
/home/anurag/apache-solr-1.3.0/KEYS.txt
/home/anurag/apache-solr-1.3.0/LICENSE.txt
/home/anurag/apache-solr-1.3.0/NOTICE.txt
/home/anurag/apache-solr-1.3.0/README.txt

and i want to edit the source code to implement my things. How should i
proceed?

-
Kumar Anurag

--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-edit-compile-the-SOLR-source-code-tp477584p2355270.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to Configure Solr to pick my lucene custom filter

2011-01-26 Thread Valiveti

We thought of using "fq". But that seems not to suit our scenario. 

Both denial and Grant access permissions are stored on the documnet as
rules.

The order of the rules also need to be considered.
We might have a huge list of values for the ACL field. Each value is
considered to be a rule.
Sample:

  -Group(X)
  +Group(Y)
  [Condition]+Group(Z)
  .


The documnet access should be denied for the users who belong to Grop X  .
User who belong to Group Y should be provided access.

If a user who belongs to group Z gets this document during the search since
the first two rules does not match it proceeds to the third rule. Where the
condition will be evaluated based on certain inputs provided to the filter
and if that condition is met then the user will be granted access.

This is just an example. We have many more different rules defined.

So thought of having a filter that can get hold of this rules and process.

Note: The rules are not same for all documents.

Thanks,
Valiveti




-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-Configure-Solr-to-pick-my-lucene-custom-filter-tp2331928p2357828.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to edit / compile the SOLR source code

2011-01-26 Thread Erick Erickson
Jonathan:

If you're working off trunk (and 3x), btw, there's a *great* addition
especially if you use IntelliJ (I haven't personally worked with the
Eclipse, there's a target for that too). Just get the source. Go to the top
level (e.g. apache-trunk). Execute "ant idea". Open IntelliJ and point it at
the directory via "open project".

Go into project settings and set your compiler. Wait for Idea to finish
indexing, etc. Run tests. Use ^N to find classes. Step through the code by
debugging unit tests.

Apart from the one trick of "opening project" on the folder rather than a
cute little IntelliJ icon (select the root directory), the *only* thing you
have to do is set your compiler. After having to spend *days* setting up the
IDE on some projects this is a dream. Apart from getting the source and
waiting for Idea to do its first-pass indexing, I can set up a new IDE
instance in, maybe, 5 minutes. Sweet!

And, BTW, the IntelliJ "community edition" is free and open-source now.

Best
Erick

On Wed, Jan 26, 2011 at 4:41 PM, Jonathan Rochkind  wrote:

> [Btw, this is great, thank you so much to Solr devs for providing simple
> ant-based compilation, and not making me install specific development tools
> and/or figure out how to use maven to compile, like certain other java
> projects. Just make sure ant is installed and 'ant dist', I can do that!  I
> more or less know how to write Java, at least for simple things,  but I
> still have trouble getting the right brew of required Java dev tools working
> properly to compile some projects! ]
>
>
> On 1/26/2011 4:19 PM, Erick Erickson wrote:
>
>> Sure, at the top level (above src) you should be able to just type
>> "ant dist", then look in the "dist" directory ant there should be a
>> solr.war
>>
>> Best
>> Erick
>>
>> On Wed, Jan 26, 2011 at 11:43 AM, Anurag
>>  wrote:
>>
>>  Actually i also want to edit Source Files of Solr.Does that mean i will
>>> have
>>> to go in "Src" directory of Solr and then rebuild using ant? I need not
>>> compile them or Ant will  do the whole compiling as well as updating the
>>> jar
>>> files?
>>> i have the following files in Solr-1.3.0 directory
>>>
>>> /home/anurag/apache-solr-1.3.0/build
>>> /home/anurag/apache-solr-1.3.0/client
>>> /home/anurag/apache-solr-1.3.0/contrib
>>> /home/anurag/apache-solr-1.3.0/dist
>>> /home/anurag/apache-solr-1.3.0/docs
>>> /home/anurag/apache-solr-1.3.0/example
>>> /home/anurag/apache-solr-1.3.0/lib
>>> /home/anurag/apache-solr-1.3.0/src
>>> /home/anurag/apache-solr-1.3.0/build.xml
>>> /home/anurag/apache-solr-1.3.0/CHANGES.txt
>>> /home/anurag/apache-solr-1.3.0/common-build.xml
>>> /home/anurag/apache-solr-1.3.0/KEYS.txt
>>> /home/anurag/apache-solr-1.3.0/LICENSE.txt
>>> /home/anurag/apache-solr-1.3.0/NOTICE.txt
>>> /home/anurag/apache-solr-1.3.0/README.txt
>>>
>>> and i want to edit the source code to implement my things. How should i
>>> proceed?
>>>
>>> -
>>> Kumar Anurag
>>>
>>> --
>>> View this message in context:
>>>
>>> http://lucene.472066.n3.nabble.com/How-to-edit-compile-the-SOLR-source-code-tp477584p2355270.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>


How to group result when search on multiple fields

2011-01-26 Thread cyang2010

Let me give an example to illustrate my question:

On netflix site, the search box allow you to search by movie, tv shows,
actors, directors, and genres.  

If "Tomcat" is searched, it gives result as:  move titles with "Tomcat" or
whatever, and somewhere in between , it also show two actors, "Tom Cruise"
and "Tom Hanks".   Then followed by a lot of other movies titles.  

If this is all based on the same type of index document (titles that has
title name, associated actors, directors, and genres), then search result
are all titles.  How is it able to render matching actors as part of the
result.  In other word, how does it tell some movie are returned because of
actor match?  

If it is implemented as two different type of index document.  One document
type for title (name, actors, directors ...), the other is for actor (actor
name, movie/tv titles).   How does it merge result?  As far as i notice,
sometimes actors name can appear anywhere in search result as a group.   Is
it just comaring the score of the first actor document with that of title
match result, and then decide where to insert the actor match result?  Well,
that can be inaccurate, right?  Score from two different type of document
are not comparable right?

Let me know what your thought on this.  Thanks in advance.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-group-result-when-search-on-multiple-fields-tp2358441p2358441.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to group result when search on multiple fields

2011-01-26 Thread Dennis Gearon
Thsi is probably either 'shingling' or 'facets'.

Someone more experienced can verify that or add more details.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: cyang2010 
To: solr-user@lucene.apache.org
Sent: Wed, January 26, 2011 3:35:47 PM
Subject: How to group result when search on multiple fields


Let me give an example to illustrate my question:

On netflix site, the search box allow you to search by movie, tv shows,
actors, directors, and genres.  

If "Tomcat" is searched, it gives result as:  move titles with "Tomcat" or
whatever, and somewhere in between , it also show two actors, "Tom Cruise"
and "Tom Hanks".   Then followed by a lot of other movies titles.  

If this is all based on the same type of index document (titles that has
title name, associated actors, directors, and genres), then search result
are all titles.  How is it able to render matching actors as part of the
result.  In other word, how does it tell some movie are returned because of
actor match?  

If it is implemented as two different type of index document.  One document
type for title (name, actors, directors ...), the other is for actor (actor
name, movie/tv titles).   How does it merge result?  As far as i notice,
sometimes actors name can appear anywhere in search result as a group.   Is
it just comaring the score of the first actor document with that of title
match result, and then decide where to insert the actor match result?  Well,
that can be inaccurate, right?  Score from two different type of document
are not comparable right?

Let me know what your thought on this.  Thanks in advance.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-group-result-when-search-on-multiple-fields-tp2358441p2358441.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to group result when search on multiple fields

2011-01-26 Thread cyang2010

Since it is a search applying for all fields, and the only result that
require grouping is people (actors/directors), i am guessing this:

1. The search still queries single index.  
2. there are two searches underlying.  One for matching movie/tv name,
genres name.  The other one for top two matching actors/directors by name.  
3. merge two result based on score.

Still i don't see how two query result score is comparable...
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-group-result-when-search-on-multiple-fields-tp2358441p2358575.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to group result when search on multiple fields

2011-01-26 Thread Markus Jelsma
http://wiki.apache.org/solr/ClusteringComponent
http://wiki.apache.org/solr/FieldCollapsing


Re: How to group result when search on multiple fields

2011-01-26 Thread cyang2010

By taking a quick look, that field collapsing seem to be what i want.  I am
not sure what clusteringcomponent is still.   I will look into more.  

Is "Field Collapsing" a new feature for solr 4.0 (not yet released yet)?  
If so, i will have to wait for it.

Thanks for point it out!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-group-result-when-search-on-multiple-fields-tp2358441p2358756.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Delta Import occasionally missing records.

2011-01-26 Thread Lance Norskog
The SolrEntityProcessor would be a top-level entity. You would do a
query like this: &sort=timestamp,desc&rows=1&fl=timestamp. This gives
you one data item: the timestamp of the last item added to the index.

With this, the JDBC sub-entity would create a query that chooses all
rows with a timestamp >= this latest timestamp. It will not be easy to
put this together, but it is possible :)

Good luck!

Lance

On Mon, Jan 24, 2011 at 2:04 AM, btucker  wrote:
>
> Thank you for your response.
>
> In what way is 'timestamp' not perfect?
>
> I've looked into the SolrEntityProcessor and added a timestamp field to our
> index.
> However i'm struggling to work out a query to get the max value od the
> timestamp field
> and does the SolrEntityProcessor entity appear before the root entity or
> does it wrap around the root entity.
>
> On 22 January 2011 07:24, Lance Norskog-2 [via Lucene] <
> ml-node+2307215-627680969-326...@n3.nabble.com
>> wrote:
>
>> The timestamp thing is not perfect. You can instead do a search
>> against Solr and find the latest timestamp in the index. SOLR-1499
>> allows you to search against Solr in the DataImportHandler.
>>
>> On Fri, Jan 21, 2011 at 2:27 AM, btucker <[hidden 
>> email]>
>> wrote:
>>
>> >
>> > Hello
>> >
>> > We've just started using solr to provide search functionality for our
>> > application with the DataImportHandler performing a delta-import every 1
>> > fired by crontab, which works great, however it does occasionally miss
>> > records that are added to the database while the delta-import is running.
>>
>> >
>> > Our data-config.xml has the following queries in its root entity:
>> >
>> > query="SELECT id, date_published, date_created, publish_flag FROM Item
>> WHERE
>> > id > 0
>> >
>> > AND record_type_id=0
>> >
>> > ORDER BY id DESC"
>> > preImportDeleteQuery="SELECT item_id AS Id FROM
>> > gnpd_production.item_deletions"
>> > deletedPkQuery="SELECT item_id AS id FROM gnpd_production.item_deletions
>> > WHERE deletion_date >=
>> >
>> > SUBDATE('${dataimporter.last_index_time}', INTERVAL 5 MINUTE)"
>> > deltaImportQuery="SELECT id, date_published, date_created, publish_flag
>> FROM
>> > Item WHERE id > 0
>> >
>> > AND record_type_id=0
>> >
>> > AND id=${dataimporter.delta.id}
>> >
>> > ORDER BY id DESC"
>> > deltaQuery="SELECT id, date_published, date_created, publish_flag FROM
>> Item
>> > WHERE id > 0
>> >
>> > AND record_type_id=0
>> >
>> > AND sys_time_stamp >=
>> >
>> > SUBDATE('${dataimporter.last_index_time}', INTERVAL 1 MINUTE) ORDER BY id
>>
>> > DESC">
>> >
>> > I think the problem i'm having comes from the way solr stores the
>> > last_index_time in conf/dataimport.properties as stated on the wiki as
>> >
>> > ""When delta-import command is executed, it reads the start time stored
>> in
>> > conf/dataimport.properties. It uses that timestamp to run delta queries
>> and
>> > after completion, updates the timestamp in conf/dataimport.properties.""
>> >
>> > Which to me seems to indicate that any records with a time-stamp between
>> > when the dataimport starts and ends will be missed as the last_index_time
>> is
>> > set to when it completes the import.
>> >
>> > This doesn't seem quite right to me. I would have expected the
>> > last_index_time to refer to when the dataimport was last STARTED so that
>> > there was no gaps in the timestamp covered.
>> >
>> > I changed the deltaQuery of our config to include the SUBDATE by INTERVAL
>> 1
>> > MINUTE statement to alleviate this problem, but it does only cover times
>> > when the delta-import takes less than a minute.
>> >
>> > Any ideas as to how this can be overcome? ,other than increasing the
>> > INTERVAL to something larger.
>> >
>> > Regards
>> >
>> > Barry Tucker
>> > --
>> > View this message in context:
>> http://lucene.472066.n3.nabble.com/Delta-Import-occasionally-missing-records-tp2300877p2300877.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> [hidden email] 
>>
>>
>> --
>>  If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://lucene.472066.n3.nabble.com/Delta-Import-occasionally-missing-records-tp2300877p2307215.html
>>  To unsubscribe from Delta Import occasionally missing records., click
>> here.
>>
>>
>
> 
>
> Mintel International Group Ltd | 18-19 Long Lane | London EC1A 9PL UK
> Registered in England: Number 1475918. | VAT Number: GB 232 9342 72
>
> Contact details for our other offices can be found at 
> http://www.mintel.com/office-locations.
>
> This email and any attachments may include con

Does solr supports indexing of files other than UTF-8

2011-01-26 Thread prasad deshpande
Hello,


I am able to successfully index/search non-Engilsh data(like Hebrew,
Japnese) which was encoded in UTF-8.
However, When I tried to index data which was encoded in local encoding like
Big5 for Japanese I could not see the desired results.
The contents after indexing looked garbled for Big5 encoded document when I
searched for all indexed documents.

Converting a complete document in UTF-8 is not feasible.
I am not very clear about how Solr support these localizations with other
than UTF-8 encoding.


I verified below links
1. http://lucene.apache.org/java/3_0_3/api/all/index.html
2.  http://wiki.apache.org/solr/LanguageAnalysis

Thanks and Regards,
Prasad


configure httpclient to access solr with user credential on third party host

2011-01-26 Thread Darniz

Hello,
i uploaded solr.war file on my hosting provider and added security
constraint in web.xml file on my solr war so that only specific user with a
certain role can issue get and post request. When i open browser and type
www.maydomainname.com/solr i get a dialog box to enter userid and password.
No issues until now.

Now the issue is that i have one more app  on the same tomcat container
which will index document into solr. In order for this app to issue post
request it has to configure the http client credentials. I checked with my
hosting service and they told me at tomcat is running on port 8834 since
apache is sitting in the front, the below is the code snipped i use to set
http credentials.

CommonsHttpSolrServer server = new
CommonsHttpSolrServer("http://localhost:8834/solr";);
  Credentials defaultcreds = new
UsernamePasswordCredentials("solr","solr");
  server.getHttpClient().getState().setCredentials(new
AuthScope("localhost",8834,AuthScope.ANY_REALM),
defaultcreds);

i am getting the following error, any help will be appreciated.
ERROR TP-Processor9 org.apache.jk.common.MsgAjp - BAD packet signature 20559
ERROR TP-Processor9 org.apache.jk.common.ChannelSocket - Error, processing
connection
java.lang.IndexOutOfBoundsException
at java.io.BufferedInputStream.read(BufferedInputStream.java:310)
at org.apache.jk.common.ChannelSocket.read(ChannelSocket.java:621)
at
org.apache.jk.common.ChannelSocket.receive(ChannelSocket.java:578)
at
org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:686)
at
org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:891)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
at java.lang.Thread.run(Thread.java:619)
ERROR TP-Processor9 org.apache.jk.common.MsgAjp - BAD packet signature 20559
ERROR TP-Processor9 org.apache.jk.common.ChannelSocket - Error, processing
connection
java.lang.IndexOutOfBoundsException
at java.io.BufferedInputStream.read(BufferedInputStream.java:310)
at org.apache.jk.common.ChannelSocket.read(ChannelSocket.java:621)
at
org.apache.jk.common.ChannelSocket.receive(ChannelSocket.java:578)
at
org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:686)
at
org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:891)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
at java.lang.Thread.run(Thread.java:619)


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/configure-httpclient-to-access-solr-with-user-credential-on-third-party-host-tp2360364p2360364.html
Sent from the Solr - User mailing list archive at Nabble.com.


A Maven archetype that helps packaging Solr as a standalone application embedded in Apache Tomcat

2011-01-26 Thread Simone Tripodi
Hi all guys,
this short mail just to make the Maven/Solr communities aware that we
published an Apache Maven archetype[1] (that we lazily called
'solr-packager' :P) that helps Apache Solr developers creating
complete standalone Solr-based applications, embedded in Apache
Tomcat, with few operations.
We started developing it internally to reduce and help the `ops`
tasks, since it has been useful we hope it could be also for you, so
decided to publish it as oss.
Questions, feedbacks, constructive criticisms, ideas... are more than
welcome, if interested visit the github[2] page.
Have a nice day, all the best
Simo

[1] http://sourcesense.github.com/solr-packager/
[2] https://github.com/sourcesense/solr-packager

http://people.apache.org/~simonetripodi/
http://www.99soft.org/