Re: Is negative boost possible?

2009-08-19 Thread Marc Sturlese


:>the only way to "negative boost" is to "positively boost" the inverse...
:>
:>  (*:* -field1:value_to_penalize)^10

This will do the job aswell as bq supports pure negative queries (at least
in trunk):
bq=-field1:value_to_penalize^10

http://wiki.apache.org/solr/SolrRelevancyFAQ#head-76e53db8c5fd31133dc3566318d1aad2bb23e07e


hossman wrote:
> 
> 
> : Use decimal figure less than 1, e.g. 0.5, to express less importance.
> 
> but that's stil la positive boost ... it still increases the scores of 
> documents that match.
> 
> the only way to "negative boost" is to "positively boost" the inverse...
> 
>   (*:* -field1:value_to_penalize)^10
> 
> : > I am looking for a way to assign negative boost to a term in Solr
> query.
> : > Our use scenario is that we want to boost matching documents that are
> : > updated recently and penalize those that have not been updated for a
> long
> : > time.  There are other terms in the query that would affect the scores
> as
> : > well.  For example we construct a query similar to this:
> : > 
> : > *:* field1:value1^2  field2:value2^2 lastUpdateTime:[NOW/DAY-90DAYS TO
> *]^5
> : > lastUpdateTime:[* TO NOW/DAY-365DAYS]^-3
> : > 
> : > I notice it's not possible to simply use a negative boosting factor in
> the
> : > query.  Is there any way to achieve such result?
> : > 
> : > Regards,
> : > Shi Quan He
> : > 
> : >   
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Is-negative-boost-possible--tp25025775p25039059.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Replication over multi-core solr

2009-08-19 Thread Licinio Fernández Maurelo
Hi Vivek,
currently we want to add cores dynamically when the active one reaches
some capacity,
can you give me some hints to achieve such this functionality? (Just
wondering if you have used shell-scripting or you have code some 100%
Java based solution)

Thx


2009/8/19 Noble Paul നോബിള്‍  नोब्ळ् :
> On Wed, Aug 19, 2009 at 2:27 AM, vivek sar wrote:
>> Hi,
>>
>>  We use multi-core setup for Solr, where new cores are added
>> dynamically to solr.xml. Only one core is active at a time. My
>> question is how can the replication be done for multi-core - so every
>> core is replicated on the slave?
>
> replication does not handle new core creation. You will have to issue
> the core creation command to each slave separately.
>>
>> I went over the wiki, http://wiki.apache.org/solr/SolrReplication,
>> and few questions related to that,
>>
>> 1) How do we replicate solr.xml where we have list of cores? Wiki
>> says, "Only files in the 'conf' dir of solr instance is replicated. "
>> - since, solr.xml is in the home directory how do we replicate that?
> solr.xml canot be replicated. even if you did it is not reloaded.
>>
>> 2) Solrconfig.xml in slave takes a static core url,
>>
>>    > name="masterUrl">http://localhost:port/solr/corename/replication
>
> put a placeholder like
>  name="masterUrl">http://localhost:port/solr/${solr.core.name}/replication
> so the corename is automatically replaced
>
>>
>> As in our case cores are created dynamically (new core created after
>> the active one reaches some capacity), how can we define master core
>> dynamically for replication? The only I see it is using "fetchIndex"
>> command and passing new core info there - is it right? If so, does the
>> slave application have write code to poll Master periodically and fire
>> "fetchIndex" command, but how would Slave know the Master corename -
>> as they are created dynamically on the Master?
>>
>> Thanks,
>> -vivek
>>
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>



-- 
Lici


Problems importing HTML content contained within XML document

2009-08-19 Thread venn hardy

Hello,

I have just started trying out SOLR to index some XML documents that I receive. 
I am
using the SOLR 1.3 and its HttpDataSource in conjunction with the 
XPathEntityProcessor.

 

I am finding the data import really useful so far, but I am having a few 
problems when
I try and import HTML contained within one of the XML tags . The data 
import just seems
to ignore the textContent silently but it imports everything else.

 

When I do a query through the SOLR admin interface, only the id and author 
fields are displayed.

Any ideas what I am doing wrong?

 

Thanks

 

This is what my dataConfig looks like:

  
  
 http://localhost:9080/data/20090817070752.xml"; 
processor="XPathEntityProcessor" forEach="/document/category" 
transformer="DateFormatTransformer" stream="true" dataSource="dataSource">
 
  
  
 
  


 

This is how I have specified my schema


   
   


 id
 id

 

And this is what my XML document looks like:


 
  123456
  Authori name
  
  Lorem ipsum dolor sit amet, consectetur adipiscing elit.
  Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius 
varius felis ut vestibulum
  Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit,
  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
vestibulum
  Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit,
  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
vestibulum
  
 


_
Looking for a place to rent, share or buy this winter? Find your next place 
with Ninemsn property
http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Edomain%2Ecom%2Eau%2F%3Fs%5Fcid%3DFDMedia%3ANineMSN%5FHotmail%5FTagline&_t=774152450&_r=Domain_tagline&_m=EXT

Re: CorruptIndexException: Unknown format version

2009-08-19 Thread Licinio Fernández Maurelo
It looks like your solr lucene-core version doesn't match with the
lucene version used to generate the index, as Yonik said, looks like
there is a lucene library conflict.

2009/8/19 Chris Hostetter :
>
> : how can that happen, it is a new index, and it is already corrupt?
> :
> : Did anybody else something like this?
>
> "Unknown format version" doesn't mean your index is corrupt .. it means
> the version of LUcnee parsing the index doesn't recognize the index format
> version ... typically it means you are trying to open an index generated
> by a newer version of lucene then the one you are using.
>
>
>
>
> -Hoss
>
>



-- 
Lici


Re: Replication over multi-core solr

2009-08-19 Thread vivek sar
Licinio,

 Please open a separate thread - as it's a different issue - and I can
respond there.

-vivek

2009/8/19 Licinio Fernández Maurelo :
> Hi Vivek,
> currently we want to add cores dynamically when the active one reaches
> some capacity,
> can you give me some hints to achieve such this functionality? (Just
> wondering if you have used shell-scripting or you have code some 100%
> Java based solution)
>
> Thx
>
>
> 2009/8/19 Noble Paul നോബിള്‍  नोब्ळ् :
>> On Wed, Aug 19, 2009 at 2:27 AM, vivek sar wrote:
>>> Hi,
>>>
>>>  We use multi-core setup for Solr, where new cores are added
>>> dynamically to solr.xml. Only one core is active at a time. My
>>> question is how can the replication be done for multi-core - so every
>>> core is replicated on the slave?
>>
>> replication does not handle new core creation. You will have to issue
>> the core creation command to each slave separately.
>>>
>>> I went over the wiki, http://wiki.apache.org/solr/SolrReplication,
>>> and few questions related to that,
>>>
>>> 1) How do we replicate solr.xml where we have list of cores? Wiki
>>> says, "Only files in the 'conf' dir of solr instance is replicated. "
>>> - since, solr.xml is in the home directory how do we replicate that?
>> solr.xml canot be replicated. even if you did it is not reloaded.
>>>
>>> 2) Solrconfig.xml in slave takes a static core url,
>>>
>>>    >> name="masterUrl">http://localhost:port/solr/corename/replication
>>
>> put a placeholder like
>> > name="masterUrl">http://localhost:port/solr/${solr.core.name}/replication
>> so the corename is automatically replaced
>>
>>>
>>> As in our case cores are created dynamically (new core created after
>>> the active one reaches some capacity), how can we define master core
>>> dynamically for replication? The only I see it is using "fetchIndex"
>>> command and passing new core info there - is it right? If so, does the
>>> slave application have write code to poll Master periodically and fire
>>> "fetchIndex" command, but how would Slave know the Master corename -
>>> as they are created dynamically on the Master?
>>>
>>> Thanks,
>>> -vivek
>>>
>>
>>
>>
>> --
>> -
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>
>
>
> --
> Lici
>


Adding cores dynamically

2009-08-19 Thread Licinio Fernández Maurelo
Hi there,

currently we want to add cores dynamically when the active one reaches
some capacity,
can anyone give me some hints to achieve such this functionality? (Just
wondering if you have used shell-scripting or you have code some 100%
Java based solution)

Thx


-- 
Lici


Re: Replication over multi-core solr

2009-08-19 Thread Licinio Fernández Maurelo
Ok

2009/8/19 vivek sar :
> Licinio,
>
>  Please open a separate thread - as it's a different issue - and I can
> respond there.
>
> -vivek
>
> 2009/8/19 Licinio Fernández Maurelo :
>> Hi Vivek,
>> currently we want to add cores dynamically when the active one reaches
>> some capacity,
>> can you give me some hints to achieve such this functionality? (Just
>> wondering if you have used shell-scripting or you have code some 100%
>> Java based solution)
>>
>> Thx
>>
>>
>> 2009/8/19 Noble Paul നോബിള്‍  नोब्ळ् :
>>> On Wed, Aug 19, 2009 at 2:27 AM, vivek sar wrote:
 Hi,

  We use multi-core setup for Solr, where new cores are added
 dynamically to solr.xml. Only one core is active at a time. My
 question is how can the replication be done for multi-core - so every
 core is replicated on the slave?
>>>
>>> replication does not handle new core creation. You will have to issue
>>> the core creation command to each slave separately.

 I went over the wiki, http://wiki.apache.org/solr/SolrReplication,
 and few questions related to that,

 1) How do we replicate solr.xml where we have list of cores? Wiki
 says, "Only files in the 'conf' dir of solr instance is replicated. "
 - since, solr.xml is in the home directory how do we replicate that?
>>> solr.xml canot be replicated. even if you did it is not reloaded.

 2) Solrconfig.xml in slave takes a static core url,

    >>> name="masterUrl">http://localhost:port/solr/corename/replication
>>>
>>> put a placeholder like
>>> >> name="masterUrl">http://localhost:port/solr/${solr.core.name}/replication
>>> so the corename is automatically replaced
>>>

 As in our case cores are created dynamically (new core created after
 the active one reaches some capacity), how can we define master core
 dynamically for replication? The only I see it is using "fetchIndex"
 command and passing new core info there - is it right? If so, does the
 slave application have write code to poll Master periodically and fire
 "fetchIndex" command, but how would Slave know the Master corename -
 as they are created dynamically on the Master?

 Thanks,
 -vivek

>>>
>>>
>>>
>>> --
>>> -
>>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>>
>>
>>
>>
>> --
>> Lici
>>
>



-- 
Lici


Re: Spanish Stemmer

2009-08-19 Thread Licinio Fernández Maurelo
Hi, take a look at this:



  





  
  





  


Un saludo

2009/8/19 Robert Muir :
> hi, it looks like you might just have a simple typo:
>
>  
>
> if you change it to language="Spanish" it should work.
>
>
> --
> Robert Muir
> rcm...@gmail.com
>



-- 
Lici


Re: Strange error with shards

2009-08-19 Thread Licinio Fernández Maurelo
Looks like the index is corrupted, try restoring it

2009/8/18 ahammad :
>
> Hello,
>
> I have been using multicore/shards for the past 5 months or so with no
> problems at all. I just added another core to my Solr server, but for some
> reason I can never get the shards working when that specific core is
> anywhere in the URL (either in the shards list or the base URL).
>
> HTTP Status 500 - null java.lang.NullPointerException at
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:437)
> at
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:281)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330) at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
> at
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
> at
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:574)
> at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1527)
> at java.lang.Thread.run(Thread.java:619)
>
> The way I created this shard was to copy an existing one, erasing all the
> data files/folders, and modifying my schema/data-config files. So the core
> settings are pretty much the same.
>
> If I try the shard parameter with any of the other 7 cores that I have, it
> works fine. It's only when this specific one is in the URL...
>
> Cheers
> --
> View this message in context: 
> http://www.nabble.com/Strange-error-with-shards-tp25027486p25027486.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lici


Re: Problems importing HTML content contained within XML document

2009-08-19 Thread Martijn v Groningen
Hi Venn,

I think what is happening when the BODY element is being processed by
xpath expressen (/document/category/BODY), is that it does not
retrieve the text content from the P elements inside the body element.
The expression will only retrieve text content that is directly a
child of the BODY element. I do not know the xpath function(s) the
data importhandler currently supports to return the text content of a
node and all its child nodes.

Maybe the expression  /document/category/BODY/* will work.

Cheers,

Martijn

2009/8/19 venn hardy :
>
> Hello,
>
> I have just started trying out SOLR to index some XML documents that I 
> receive. I am
> using the SOLR 1.3 and its HttpDataSource in conjunction with the 
> XPathEntityProcessor.
>
>
>
> I am finding the data import really useful so far, but I am having a few 
> problems when
> I try and import HTML contained within one of the XML tags . The data 
> import just seems
> to ignore the textContent silently but it imports everything else.
>
>
>
> When I do a query through the SOLR admin interface, only the id and author 
> fields are displayed.
>
> Any ideas what I am doing wrong?
>
>
>
> Thanks
>
>
>
> This is what my dataConfig looks like:
> 
>  
>  
>   url="http://localhost:9080/data/20090817070752.xml"; 
> processor="XPathEntityProcessor" forEach="/document/category" 
> transformer="DateFormatTransformer" stream="true" dataSource="dataSource">
>         
>  
>  
>  
>  
> 
>
>
>
> This is how I have specified my schema
> 
>    />
>   
>   
> 
>
>  id
>  id
>
>
>
> And this is what my XML document looks like:
>
> 
>  
>  123456
>  Authori name
>  
>  Lorem ipsum dolor sit amet, consectetur adipiscing elit.
>  Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius 
> varius felis ut vestibulum
>  Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit,
>  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
> vestibulum
>  Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit,
>  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
> vestibulum
>  
>  
> 
>
> _
> Looking for a place to rent, share or buy this winter? Find your next place 
> with Ninemsn property
> http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Edomain%2Ecom%2Eau%2F%3Fs%5Fcid%3DFDMedia%3ANineMSN%5FHotmail%5FTagline&_t=774152450&_r=Domain_tagline&_m=EXT


Re: Problems importing HTML content contained within XML document

2009-08-19 Thread Noble Paul നോബിള്‍ नोब्ळ्
try this


this should slurp al the tags under body

On Wed, Aug 19, 2009 at 1:44 PM, venn hardy wrote:
>
> Hello,
>
> I have just started trying out SOLR to index some XML documents that I 
> receive. I am
> using the SOLR 1.3 and its HttpDataSource in conjunction with the 
> XPathEntityProcessor.
>
>
>
> I am finding the data import really useful so far, but I am having a few 
> problems when
> I try and import HTML contained within one of the XML tags . The data 
> import just seems
> to ignore the textContent silently but it imports everything else.
>
>
>
> When I do a query through the SOLR admin interface, only the id and author 
> fields are displayed.
>
> Any ideas what I am doing wrong?
>
>
>
> Thanks
>
>
>
> This is what my dataConfig looks like:
> 
>  
>  
>   url="http://localhost:9080/data/20090817070752.xml"; 
> processor="XPathEntityProcessor" forEach="/document/category" 
> transformer="DateFormatTransformer" stream="true" dataSource="dataSource">
>         
>  
>  
>  
>  
> 
>
>
>
> This is how I have specified my schema
> 
>    />
>   
>   
> 
>
>  id
>  id
>
>
>
> And this is what my XML document looks like:
>
> 
>  
>  123456
>  Authori name
>  
>  Lorem ipsum dolor sit amet, consectetur adipiscing elit.
>  Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius 
> varius felis ut vestibulum
>  Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit,
>  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
> vestibulum
>  Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit,
>  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
> vestibulum
>  
>  
> 
>
> _
> Looking for a place to rent, share or buy this winter? Find your next place 
> with Ninemsn property
> http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Edomain%2Ecom%2Eau%2F%3Fs%5Fcid%3DFDMedia%3ANineMSN%5FHotmail%5FTagline&_t=774152450&_r=Domain_tagline&_m=EXT



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Problems importing HTML content contained within XML document

2009-08-19 Thread Noble Paul നോബിള്‍ नोब्ळ्
sorry


2009/8/19 Noble Paul നോബിള്‍  नोब्ळ् :
> try this
> 
>
> this should slurp al the tags under body
>
> On Wed, Aug 19, 2009 at 1:44 PM, venn hardy wrote:
>>
>> Hello,
>>
>> I have just started trying out SOLR to index some XML documents that I 
>> receive. I am
>> using the SOLR 1.3 and its HttpDataSource in conjunction with the 
>> XPathEntityProcessor.
>>
>>
>>
>> I am finding the data import really useful so far, but I am having a few 
>> problems when
>> I try and import HTML contained within one of the XML tags . The data 
>> import just seems
>> to ignore the textContent silently but it imports everything else.
>>
>>
>>
>> When I do a query through the SOLR admin interface, only the id and author 
>> fields are displayed.
>>
>> Any ideas what I am doing wrong?
>>
>>
>>
>> Thanks
>>
>>
>>
>> This is what my dataConfig looks like:
>> 
>>  
>>  
>>  > url="http://localhost:9080/data/20090817070752.xml"; 
>> processor="XPathEntityProcessor" forEach="/document/category" 
>> transformer="DateFormatTransformer" stream="true" dataSource="dataSource">
>>         
>>  
>>  
>>  
>>  
>> 
>>
>>
>>
>> This is how I have specified my schema
>> 
>>   > required="true" />
>>   
>>   
>> 
>>
>>  id
>>  id
>>
>>
>>
>> And this is what my XML document looks like:
>>
>> 
>>  
>>  123456
>>  Authori name
>>  
>>  Lorem ipsum dolor sit amet, consectetur adipiscing elit.
>>  Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus 
>> varius varius felis ut vestibulum
>>  Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem 
>> elit,
>>  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
>> vestibulum
>>  Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem 
>> elit,
>>  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
>> vestibulum
>>  
>>  
>> 
>>
>> _
>> Looking for a place to rent, share or buy this winter? Find your next place 
>> with Ninemsn property
>> http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Edomain%2Ecom%2Eau%2F%3Fs%5Fcid%3DFDMedia%3ANineMSN%5FHotmail%5FTagline&_t=774152450&_r=Domain_tagline&_m=EXT
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Relevant results with DisMaxRequestHandler

2009-08-19 Thread Vincent Pérès

Wow, it's like the 'mm' parameters is just appeared for the first time...
Yes, I read the doc few times, but never understood that the documents who
doesn't match any of the expressions will not be return... my apologize
everything seems more clear now thanks to the min number parameter.

Thank you,
Vincent


hossman wrote:
> 
> 
> : The 'qf' parameter used in the dismax seems to work with a 'AND'
> separator.
> : I have much more results without dixmax. Is there any way to keep the
> same
> : amount of document and process the 'qf' ?
> 
> did you read any of the docs on dismax?
> 
>   http://wiki.apache.org/solr/DisMaxRequestHandler
> 
> did you look at the "mm" param?
> 
>   http://wiki.apache.org/solr/DisMaxRequestHandler#mm
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Relevant-results-with-DisMaxRequestHandler-tp24716870p25041314.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: JVM Heap utilization & Memory leaks with Solr

2009-08-19 Thread Rahul R
Fuad,
We have around 5 million documents and around 3700 fields. All documents
will not have values for all the fields JRockit is not approved for use
within my organization. But thanks for the info anyway.

Regards
Rahul

On Tue, Aug 18, 2009 at 9:41 AM, Funtick  wrote:

>
> BTW, you should really prefer JRockit which really rocks!!!
>
> "Mission Control" has necessary toolongs; and JRockit produces _nice_
> exception stacktrace (explaining almost everything) in case of even OOM
> which SUN JVN still fails to produce.
>
>
> SolrServlet still catches "Throwable":
>
>} catch (Throwable e) {
>  SolrException.log(log,e);
>  sendErr(500, SolrException.toStr(e), request, response);
>} finally {
>
>
>
>
>
> Rahul R wrote:
> >
> > Otis,
> > Thank you for your response. I know there are a few variables here but
> the
> > difference in memory utilization with and without shards somehow leads me
> > to
> > believe that the leak could be within Solr.
> >
> > I tried using a profiling tool - Yourkit. The trial version was free for
> > 15
> > days. But I couldn't find anything of significance.
> >
> > Regards
> > Rahul
> >
> >
> > On Tue, Aug 4, 2009 at 7:35 PM, Otis Gospodnetic
> >  >> wrote:
> >
> >> Hi Rahul,
> >>
> >> A) There are no known (to me) memory leaks.
> >> I think there are too many variables for a person to tell you what
> >> exactly
> >> is happening, plus you are dealing with the JVM here. :)
> >>
> >> Try jmap -histo:live PID-HERE | less and see what's using your memory.
> >>
> >> Otis
> >> --
> >> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> >> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> >>
> >>
> >>
> >> - Original Message 
> >> > From: Rahul R 
> >> > To: solr-user@lucene.apache.org
> >> > Sent: Tuesday, August 4, 2009 1:09:06 AM
> >> > Subject: JVM Heap utilization & Memory leaks with Solr
> >> >
> >> > I am trying to track memory utilization with my Application that uses
> >> Solr.
> >> > Details of the setup :
> >> > -3rd party Software : Solaris 10, Weblogic 10, jdk_150_14, Solr 1.3.0
> >> > - Hardware : 12 CPU, 24 GB RAM
> >> >
> >> > For testing during PSR I am using a smaller subset of the actual data
> >> that I
> >> > want to work with. Details of this smaller sub-set :
> >> > - 5 million records, 4.5 GB index size
> >> >
> >> > Observations during PSR:
> >> > A) I have allocated 3.2 GB for the JVM(s) that I used. After all users
> >> > logout and doing a force GC, only 60 % of the heap is reclaimed. As
> >> part
> >> of
> >> > the logout process I am invalidating the HttpSession and doing a
> >> close()
> >> on
> >> > CoreContainer. From my application's side, I don't believe I am
> holding
> >> on
> >> > to any resource. I wanted to know if there are known issues
> surrounding
> >> > memory leaks with Solr ?
> >> > B) To further test this, I tried deploying with shards. 3.2 GB was
> >> allocated
> >> > to each JVM. All JVMs had 96 % free heap space after start up. I got
> >> varying
> >> > results with this.
> >> > Case 1 : Used 6 weblogic domains. My application was deployed one 1
> >> domain.
> >> > I split the 5 million index into 5 parts of 1 million each and used
> >> them
> >> as
> >> > shards. After multiple users used the system and doing a force GC,
> >> around
> >> 94
> >> > - 96 % of heap was reclaimed in all the JVMs.
> >> > Case 2: Used 2 weblogic domains. My application was deployed on 1
> >> domain.
> >> On
> >> > the other, I deployed the entire 5 million part index as one shard.
> >> After
> >> > multiple users used the system and doing a gorce GC, around 76 % of
> the
> >> heap
> >> > was reclaimed in the shard JVM. And 96 % was reclaimed in the JVM
> where
> >> my
> >> > application was running. This result further convinces me that my
> >> > application can be absolved of holding on to memory resources.
> >> >
> >> > I am not sure how to interpret these results ? For searching, I am
> >> using
> >> > Without Shards : EmbeddedSolrServer
> >> > With Shards :CommonsHttpSolrServer
> >> > In terms of Solr objects this is what differs in my code between
> normal
> >> > search and shards search (distributed search)
> >> >
> >> > After looking at Case 1, I thought that the CommonsHttpSolrServer was
> >> more
> >> > memory efficient but Case 2 proved me wrong. Or could there still be
> >> memory
> >> > leaks in my application ? Any thoughts, suggestions would be welcome.
> >> >
> >> > Regards
> >> > Rahul
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/JVM-Heap-utilization---Memory-leaks-with-Solr-tp24802380p25018165.html
>  Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Solr-773 (GEO Module) question

2009-08-19 Thread johan . sjoberg
Hi,


we're glancing at the GEO search module known from the jira issue 773
(http://issues.apache.org/jira/browse/SOLR-773).


It seems to us that the issue is still open and not yet included in the
nightly builds.

Is there a release plan for the nightly builds, and is this module
considered core or contrib?



Regards,
Johan



Re: MultiCore Queries? are they possible

2009-08-19 Thread Shalin Shekhar Mangar
On Tue, Aug 18, 2009 at 5:47 PM, Ninad Raut wrote:

> Hi,
> Can we create a Join query between two indexes on two cores? Is this
> possible in Solr?
> I have a index which stores author profiles and other index which stores
> content and a author id as a reference. Can I query as
> select Content,AuthorName
> from Core0,Core1
> where core0.authorid = core1.authorid and authorid=A123


No but you can always make two calls and join it yourself. However, Solr
supports multi-valued fields so it is best to de-normalize the data if you
need to show both kinds of information in one query.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Strange error with shards

2009-08-19 Thread Shalin Shekhar Mangar
On Tue, Aug 18, 2009 at 9:01 PM, ahammad  wrote:

> HTTP Status 500 - null java.lang.NullPointerException at
>
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:437)
> at
>
> The way I created this shard was to copy an existing one, erasing all the
> data files/folders, and modifying my schema/data-config files. So the core
> settings are pretty much the same.
>

What did you modify in the schema? All the shards should have the same
schema. That exception can come if the uniqueKey is missing/null.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Passing a Cookie in SolrJ

2009-08-19 Thread Shalin Shekhar Mangar
On Tue, Aug 18, 2009 at 10:18 PM, Ramirez, Paul M (388J) <
paul.m.rami...@jpl.nasa.gov> wrote:

> Hi All,
>
> The project I am working on is using Solr and OpenSSO (Sun's single sign on
> service). I need to write some sample code for our users that shows them how
> to query Solr and I would just like to point them to the SolrJ documentation
> but I can't see an easy way to be able to pass a cookie with the request.
> The cookie is needed to be able to get through the SSO layer but will just
> be ignored by Solr. I see that you are using Apache Commons Http Client and
> with that I would be able to write the cookie if I had access to the
> HttpMethod being used (GetMethod or PostMethod). However, I can not find an
> easy way to get access to this with SolrJ and thought I would ask before
> rewriting a simple example using only an ApacheHttpClient without the SolJ
> library. Thanks in advance for any pointers you may have.
>

There's no easy way I think. You can extend CommonsHttpSolrServer and
override the request method. Copy/paste the code from
CommonsHttpSolrServer#request and make the changes. It is not an elegant way
but it will work.

-- 
Regards,
Shalin Shekhar Mangar.


Re: How to boost fields with many terms against single-term?

2009-08-19 Thread Shalin Shekhar Mangar
On Wed, Aug 19, 2009 at 12:32 AM, Fuad Efendi  wrote:

> I don't want single-term docs such as "home" to appear in top for simple
> search for a home; I need "home improvement made easy" in top... How to
> implement it at query time?
>

If you always want "home improvement made easy" on top for "home", see if
the QueryElevationComponent can help.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Strange error with shards

2009-08-19 Thread ahammad

Each core has a different database as a datasource, which means that they
have different DB structures and fields. That is why the schemas are
different.

I figured out the cause of this problem. You were right, it was the
uniqueKey field. All of my cores have that field set to "id" but for this
new core, it is set to "threadID". Changing that to id fixed the problem.




Shalin Shekhar Mangar wrote:
> 
> On Tue, Aug 18, 2009 at 9:01 PM, ahammad  wrote:
> 
>> HTTP Status 500 - null java.lang.NullPointerException at
>>
>> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:437)
>> at
>>
>> The way I created this shard was to copy an existing one, erasing all the
>> data files/folders, and modifying my schema/data-config files. So the
>> core
>> settings are pretty much the same.
>>
> 
> What did you modify in the schema? All the shards should have the same
> schema. That exception can come if the uniqueKey is missing/null.
> 
> If all the shards should have the same schema, then what is the point of
> sharding in the first place? I thought that it was used to combine
> different cores with different index structures...Right now, every core I
> have is unique, and every schema is different...
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Strange-error-with-shards-tp25027486p25043859.html
Sent from the Solr - User mailing list archive at Nabble.com.



Putting a something as first query result

2009-08-19 Thread Tobias Brennecke
Hello,

I'm a bit new to solr and have the following problem, it's about events and
venues.
If a user types a name of a venue, then I'd like to return the exact match
for the venue first and then the list of events taking place at this venue.
Currently I have defined a document bound to a database query as source,
where are field for e.g. event name, venue id, venue name,...

The thing is I cannot figure out how to realize that at its best. My
approach of defining two documents in data-config.xml, where one query only
imports the venues and the other one the full event informations, failed.
Solr just won't return any results then :(
I hope anyone can help me.

Thanks in Advance!

Tobi


Re: Strange error with shards

2009-08-19 Thread Shalin Shekhar Mangar
On Wed, Aug 19, 2009 at 6:44 PM, ahammad  wrote:

>
> Each core has a different database as a datasource, which means that they
> have different DB structures and fields. That is why the schemas are
> different.


> > If all the shards should have the same schema, then what is the point of
> > sharding in the first place? I thought that it was used to combine
> > different cores with different index structures...Right now, every core I
> > have is unique, and every schema is different...
> >
>

Index is sharded when it becomes too much for one box to keep the whole
index. Distributed Search in Solr can merge these multiple indexes running
on different boxes into one result set. It is not meant for combining
different cores or different schemas. If many shards have a document with
the same uniqueKey value, any one can be returned. Typically, shards have
the same schema, with each having a disjoint subset of the complete set of
documents.

-- 
Regards,
Shalin Shekhar Mangar.


Data Modeling

2009-08-19 Thread Vladimir Landman
Hi,

I am trying to create a schema for Solr.   Here is a relational model of what 
our data might look like:

Inventory
-
Sku
Price
Weight

Attributes
---
AttributeName
AttributeValue

Applications
--
Id (Auto-Incrementing)
Sku
VehicleYear
VehicleMake
VehicleModel
VehicleEngine

There can be multiple Application(s) records.  Also, Attributes can also have 
duplicates.  Basically I want to store basic information about our inventory, 
attributes, and applications.  If I didn't have the applications, 
I would simply have:








Since one part might have 3 or 4 attributes, but 100 applications, I want to 
try to avoid having 400 records, but maybe that is just what I will have to do.

I appreciate any help.
--
Vladimir Landman
Northern Auto Parts
 



RE: JVM Heap utilization & Memory leaks with Solr

2009-08-19 Thread Fuad Efendi

Hi Rahul,

JRockit could be used at least in a test environment to monitor JVM (and
troubleshoot SOLR, licensed for-free for developers!); they have even
Eclipse plugin now, and it is licensed by Oracle (BEA)... But, of course, in
large companies test environment is in hands of testers :)


But... 3700 fields will create (over time) 3700 arrays  each of size
5,000,000!!! Even if most of fields are empty for most of documents...
Applicable to non-tokenized single-valued non-boolean fields only, Lucene
internals, FieldCache... and it won't be GC-collected after user log-off...
prefer dedicated box for SOLR.

-Fuad


-Original Message-
From: Rahul R [mailto:rahul.s...@gmail.com] 
Sent: August-19-09 6:19 AM
To: solr-user@lucene.apache.org
Subject: Re: JVM Heap utilization & Memory leaks with Solr

Fuad,
We have around 5 million documents and around 3700 fields. All documents
will not have values for all the fields JRockit is not approved for use
within my organization. But thanks for the info anyway.

Regards
Rahul

On Tue, Aug 18, 2009 at 9:41 AM, Funtick  wrote:

>
> BTW, you should really prefer JRockit which really rocks!!!
>
> "Mission Control" has necessary toolongs; and JRockit produces _nice_
> exception stacktrace (explaining almost everything) in case of even OOM
> which SUN JVN still fails to produce.
>
>
> SolrServlet still catches "Throwable":
>
>} catch (Throwable e) {
>  SolrException.log(log,e);
>  sendErr(500, SolrException.toStr(e), request, response);
>} finally {
>
>
>
>
>
> Rahul R wrote:
> >
> > Otis,
> > Thank you for your response. I know there are a few variables here but
> the
> > difference in memory utilization with and without shards somehow leads
me
> > to
> > believe that the leak could be within Solr.
> >
> > I tried using a profiling tool - Yourkit. The trial version was free for
> > 15
> > days. But I couldn't find anything of significance.
> >
> > Regards
> > Rahul
> >
> >
> > On Tue, Aug 4, 2009 at 7:35 PM, Otis Gospodnetic
> >  >> wrote:
> >
> >> Hi Rahul,
> >>
> >> A) There are no known (to me) memory leaks.
> >> I think there are too many variables for a person to tell you what
> >> exactly
> >> is happening, plus you are dealing with the JVM here. :)
> >>
> >> Try jmap -histo:live PID-HERE | less and see what's using your memory.
> >>
> >> Otis
> >> --
> >> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> >> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> >>
> >>
> >>
> >> - Original Message 
> >> > From: Rahul R 
> >> > To: solr-user@lucene.apache.org
> >> > Sent: Tuesday, August 4, 2009 1:09:06 AM
> >> > Subject: JVM Heap utilization & Memory leaks with Solr
> >> >
> >> > I am trying to track memory utilization with my Application that uses
> >> Solr.
> >> > Details of the setup :
> >> > -3rd party Software : Solaris 10, Weblogic 10, jdk_150_14, Solr 1.3.0
> >> > - Hardware : 12 CPU, 24 GB RAM
> >> >
> >> > For testing during PSR I am using a smaller subset of the actual data
> >> that I
> >> > want to work with. Details of this smaller sub-set :
> >> > - 5 million records, 4.5 GB index size
> >> >
> >> > Observations during PSR:
> >> > A) I have allocated 3.2 GB for the JVM(s) that I used. After all
users
> >> > logout and doing a force GC, only 60 % of the heap is reclaimed. As
> >> part
> >> of
> >> > the logout process I am invalidating the HttpSession and doing a
> >> close()
> >> on
> >> > CoreContainer. From my application's side, I don't believe I am
> holding
> >> on
> >> > to any resource. I wanted to know if there are known issues
> surrounding
> >> > memory leaks with Solr ?
> >> > B) To further test this, I tried deploying with shards. 3.2 GB was
> >> allocated
> >> > to each JVM. All JVMs had 96 % free heap space after start up. I got
> >> varying
> >> > results with this.
> >> > Case 1 : Used 6 weblogic domains. My application was deployed one 1
> >> domain.
> >> > I split the 5 million index into 5 parts of 1 million each and used
> >> them
> >> as
> >> > shards. After multiple users used the system and doing a force GC,
> >> around
> >> 94
> >> > - 96 % of heap was reclaimed in all the JVMs.
> >> > Case 2: Used 2 weblogic domains. My application was deployed on 1
> >> domain.
> >> On
> >> > the other, I deployed the entire 5 million part index as one shard.
> >> After
> >> > multiple users used the system and doing a gorce GC, around 76 % of
> the
> >> heap
> >> > was reclaimed in the shard JVM. And 96 % was reclaimed in the JVM
> where
> >> my
> >> > application was running. This result further convinces me that my
> >> > application can be absolved of holding on to memory resources.
> >> >
> >> > I am not sure how to interpret these results ? For searching, I am
> >> using
> >> > Without Shards : EmbeddedSolrServer
> >> > With Shards :CommonsHttpSolrServer
> >> > In terms of Solr objects this is what differs in my code between
> normal
> >> > search and shards search (distributed sea

multi words synonyms

2009-08-19 Thread Jae Joo
Hi,

I would like to make the synonym for internal medicine to physician or
doctor. but it is not working properly. Anyone help me?

synonym.index.txt
internal medicine  => physician

synonyms.query.txt
physician, internal medicine  => physician, doctor

In the Analysis tool, I can see clearly that internal medicine is converted
to physician and doctor in index and querying times, but when actual query,
it is not converted (with debugQuery=true paprameter).


internal medicine
internal medicine
job:intern job:medicin
job:intern job:medicin

It returns

1.3963256
874878_INTERNATIONAL CONSULTANTS


Here is what I have in schema.xml
   
  
  

   
  
  


Shutdown Solr

2009-08-19 Thread Miller, Michael P.
Does anyone know a graceful way to shutdown Solr?  (other than killing
the process with Ctrl-C)


Re: Shutdown Solr

2009-08-19 Thread Tobias Brennecke
it catches the kill signal and shuts down as it should, I guess :) because
it writes stuff to the log after pressing ^c

2009/8/19 Miller, Michael P. 

> Does anyone know a graceful way to shutdown Solr?  (other than killing
> the process with Ctrl-C)
>


Re: Data Modeling

2009-08-19 Thread Smiley, David W.
This is the sort of Solr fundamentals question my book (chapter 2) will help 
you with.

Think about what your user interface is.  What are users searching for?  That 
is, what exactly comes back from search results?  It's not clear from your 
description what your search scenario is.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/19/09 10:31 AM, "Vladimir Landman"  wrote:

Hi,

I am trying to create a schema for Solr.   Here is a relational model of what 
our data might look like:

Inventory
-
Sku
Price
Weight

Attributes
---
AttributeName
AttributeValue

Applications
--
Id (Auto-Incrementing)
Sku
VehicleYear
VehicleMake
VehicleModel
VehicleEngine

There can be multiple Application(s) records.  Also, Attributes can also have 
duplicates.  Basically I want to store basic information about our inventory, 
attributes, and applications.  If I didn't have the applications,
I would simply have:








Since one part might have 3 or 4 attributes, but 100 applications, I want to 
try to avoid having 400 records, but maybe that is just what I will have to do.

I appreciate any help.
--
Vladimir Landman
Northern Auto Parts





Re: Solr-773 (GEO Module) question

2009-08-19 Thread Ryan McKinley


On Aug 19, 2009, at 6:45 AM, johan.sjob...@findwise.se wrote:


Hi,


we're glancing at the GEO search module known from the jira issue 773
(http://issues.apache.org/jira/browse/SOLR-773).


It seems to us that the issue is still open and not yet included in  
the

nightly builds.


correct



Is there a release plan for the nightly builds, and is this module
considered core or contrib?



activity on the nightly builds is winding down as we gear up for the  
1.4 release.


After 1.4 is out, I expect progress on the geo stuff.  It will be in  
contrib (not core) and will likely be marked "experimental" for a  
while.  That is, stuff will be added without the expectation that the  
interfaces will be set in stone.


best
ryan


RE: Shutdown Solr

2009-08-19 Thread Fuad Efendi
catalina.sh stop


But SolrServlet catches everything and forgets to implement destroy()!

I am absolutely unsure about Ctrl-C and even have many concerns regarding
catalina.sh stop... J2EE/JEE does not specify any support for threads
outside than container-managed... 

I hope SolrServlet closes Lucene index (and other resources) and everything
follows Servlet specs... but I can't find dummies' method _"destroy()"_ in
SolrServlet!!! It shold gracefully close Lucene index and other resources.

WHY?


-Original Message-
From: Tobias Brennecke [mailto:t.bu...@gmail.com] 
Sent: August-19-09 11:39 AM
To: solr-user@lucene.apache.org
Subject: Re: Shutdown Solr

it catches the kill signal and shuts down as it should, I guess :) because
it writes stuff to the log after pressing ^c

2009/8/19 Miller, Michael P. 

> Does anyone know a graceful way to shutdown Solr?  (other than killing
> the process with Ctrl-C)
>




RE: Shutdown Solr

2009-08-19 Thread Fuad Efendi
SolrDispatchFilter has it:

  public void destroy() {
if (cores != null) {
  cores.shutdown();
  cores = null;
}
  }


It should gracefully shutdown all background threads (used by Lucene
index-merge etc)


Tomcat: catalina.sh stop, shutdown.sh, etc.; 

Ctrl-C is not graceful



-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca] 
Sent: August-19-09 1:53 PM
To: solr-user@lucene.apache.org
Subject: RE: Shutdown Solr

catalina.sh stop


But SolrServlet catches everything and forgets to implement destroy()!

I am absolutely unsure about Ctrl-C and even have many concerns regarding
catalina.sh stop... J2EE/JEE does not specify any support for threads
outside than container-managed... 

I hope SolrServlet closes Lucene index (and other resources) and everything
follows Servlet specs... but I can't find dummies' method _"destroy()"_ in
SolrServlet!!! It shold gracefully close Lucene index and other resources.

WHY?


-Original Message-
From: Tobias Brennecke [mailto:t.bu...@gmail.com] 
Sent: August-19-09 11:39 AM
To: solr-user@lucene.apache.org
Subject: Re: Shutdown Solr

it catches the kill signal and shuts down as it should, I guess :) because
it writes stuff to the log after pressing ^c

2009/8/19 Miller, Michael P. 

> Does anyone know a graceful way to shutdown Solr?  (other than killing
> the process with Ctrl-C)
>






RE: Shutdown Solr

2009-08-19 Thread Fuad Efendi
Most probably Ctrl-C is graceful for Tomcat, and kill -9 too... Tomcat is
smart... I prefer "/etc/init.d/my_tomcat" wrapper around catalina.sh ("su
tomcat", /var/lock etc...) - ok then, Graceful Shutdown depends on how you
started Tomcat.




strange sorting results: each word in field is sorted

2009-08-19 Thread Paul Rosen
I'm trying to sort, but I am not always getting the correct results and 
I'm not sure where to start tracking down the problem.


You can see the problem here (at least until it's fixed!): 
http://nines.performantsoftware.com/search/saved?user=paul&name=poem


If you sort by Title/Ascending, you get partially sorted results, but it 
seems to be using a random word to sort on instead of sorting on the 
entire title.


Page one starts good with:

(blank)
Adieu
Advertisement
Afterwards
etc

but by page 6 it starts to break down:

Elizabeth Barrett Browning
Albert and Elweena
Emerson and Bacon
etc...
Errata
Anne Bannerman: Biographical Essay
Aboringines (Estonia)
etc...

I notice in the above list that there is SOME word that is sorted, just 
not the first one. (In fact, it seems to be the word that appears 
greatest in the sort order.)


Then at the end, for instance page 336, it sorts some titles with 
diacritical marks:


Roman à Clef
The Forgotten Reaping-Hook: Sex in My Ántonia
Social (Re)Visioning in the Fields of My Ántonia
etc...

I'm not sure what info would be useful to help debug. In my schema.xml 
file, I've clipped what seems to be the relevant part:



  



  


multiValued="true"/>


Thanks,
Paul


Re: Shutdown Solr

2009-08-19 Thread Paul Tomblin
On Wed, Aug 19, 2009 at 2:43 PM, Fuad Efendi wrote:
> Most probably Ctrl-C is graceful for Tomcat, and kill -9 too... Tomcat is
> smart... I prefer "/etc/init.d/my_tomcat" wrapper around catalina.sh ("su
> tomcat", /var/lock etc...) - ok then, Graceful Shutdown depends on how you
> started Tomcat.

*No* application is graceful for "kill -9".  The whole point of "kill
-9" is that it's uncatchable.


-- 
http://www.linkedin.com/in/paultomblin


Re: strange sorting results: each word in field is sorted

2009-08-19 Thread Erik Hatcher


On Aug 19, 2009, at 2:45 PM, Paul Rosen wrote:

You can see the problem here (at least until it's fixed!): 
http://nines.performantsoftware.com/search/saved?user=paul&name=poem


Hi Paul - that project looks familiar!  :)

If you sort by Title/Ascending, you get partially sorted results,  
but it seems to be using a random word to sort on instead of sorting  
on the entire title.


I'm not sure what info would be useful to help debug. In my  
schema.xml file, I've clipped what seems to be the relevant part:


positionIncrementGap="100">

 
   
   
   
 


multiValued="true"/>


I'm surprised you're not seeing an exception when trying to sort on  
title given this configuration.  Sorting must be done on single valued  
indexed fields, that have at most a single term indexed per document.   
I recommend you use copyField to copy title to title_sort and  
configure a title_sort field as a "string" or a field type that  
analyzes only to a single term (like simply keyword tokenizing ->  
lower case filter.


Erik



RE: Shutdown Solr

2009-08-19 Thread Fuad Efendi
Thanks... "kill" should be / can be graceful; "kill -9" should kill
immediately... no any hang, whole point...

http://www.nabble.com/Is-kill--9-safe-or-not--td24866506.html




-Original Message-
From: ptomb...@gmail.com [mailto:ptomb...@gmail.com] On Behalf Of Paul
Tomblin
Sent: August-19-09 2:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Shutdown Solr

On Wed, Aug 19, 2009 at 2:43 PM, Fuad Efendi wrote:
> Most probably Ctrl-C is graceful for Tomcat, and kill -9 too... Tomcat is
> smart... I prefer "/etc/init.d/my_tomcat" wrapper around catalina.sh ("su
> tomcat", /var/lock etc...) - ok then, Graceful Shutdown depends on how you
> started Tomcat.

*No* application is graceful for "kill -9".  The whole point of "kill
-9" is that it's uncatchable.


-- 
http://www.linkedin.com/in/paultomblin




WordDelimiterFilter => MultiPhraseQuery?

2009-08-19 Thread jOhn
My issue is with the use of WordDelimiterFilter and how the QueryParser
(Dismax) converts the query into a MultiPhraseQuery.

This is on solr 1.3 / lucene 2.4.1.

For example:

1. yuma -> 3:10 to Yuma
2. yUma -> no results

For #2 it gets split into y + uma and becomes a MultiPhraseQuery requiring
both terms thus no results vs. requiring either one with a preference on
both (or a preference on joining the terms or at least an OR query).

1. joker-man -> Joker-Man Goes For Gold
2. joKerman -> no results
3. jo-kerman -> no results

1. prom night -> Prom Night
2. PromNight -> Prom Night
3. promnight -> no results
4. pRomnIght -> no results

Is there a way to configure this behavior.  I need to support all the above
use-cases.

I have a brute force solution using a copyField and a
non-WordDelimiterFilter analyzer (whitespacetoken, lowercase, patternreplace
punctuation, edgengram) and basically drop into solrconfig.xml a 2nd field
for this (titleNameSubstring2).  Those two combined is pretty much what I
need, but that costs a memory hit + performance hit whereas some tuning to
avoid MultiPhraseQuery would be a better fit.

Here are the schema.xml + solrconfig.xml bits that are not working.

[schema.xml]























[solrconfig.xml]



dismax
explicit
score desc

titleNameSubstring^200.0


titleNameSubstring^2.0


product(releaseYear,0.1)

1


searchable:true



Any ideas?

-netcam


Re: strange sorting results: each word in field is sorted

2009-08-19 Thread Paul Rosen

Erik Hatcher wrote:


On Aug 19, 2009, at 2:45 PM, Paul Rosen wrote:
You can see the problem here (at least until it's fixed!): 
http://nines.performantsoftware.com/search/saved?user=paul&name=poem


Hi Paul - that project looks familiar!  :)


Hi Erik! I should hope so! And I've gone a year without having to delve 
into solr much since it has just plain worked.


Thanks for the speedy reply.

I'm surprised you're not seeing an exception when trying to sort on 
title given this configuration.  Sorting must be done on single valued 
indexed fields, that have at most a single term indexed per document.  I 
recommend you use copyField to copy title to title_sort and configure a 
title_sort field as a "string" or a field type that analyzes only to a 
single term (like simply keyword tokenizing -> lower case filter.


Erik


I want to double check this (since you probably remember how long it 
takes to recreate the indexes). I think you're saying to add these two 
lines, then re-index:





Now, this is case-sensitive, right? So would this make it case-insensitive?


  

  




Also, I'm guessing from seeing the current results that this wouldn't 
collate the characters with diacritical marks correctly. Is there a way 
to indicate that, for instance, A-grave would sort next to A?


And, while I'm on the subject, I have to do the same thing with the 
Author field, but unfortunately, that is sometimes "First Last" and 
sometimes "Last, First". Is there any way to sort those by last name, or 
do I just have to encourage the index people to be more consistent?


I can think of a fairly simple algorithm, but am not sure where to 
implement it:


- if the word "and" or "&" appears, just look at the left side of the 
field (in other words, sort by the first name that appears.)
- if there is a comma, but it is part of ", jr." or some other common 
suffixes like that, ignore it.
- otherwise, if there is no comma, sort by the last word, unless it is 
"jr", "sr", "III", etc., then sort by the word before that.

- otherwise, sort by the first word.

That would get most of the cases.

Thanks,
Paul


FW: Data Modeling

2009-08-19 Thread Vladimir Landman
I hit reply and sent this to just David, but I think it should go to the whole 
list:

Hi David,

I want to do 2 kinds of things with Solr  Maybe 3 in the future

1. I want to use  it on our website so that a customer can filter down products 
by different attributes.  So suppose we have:

Inventory
---
ABC, 10
DEF, 15
s
Attributes

ABC,Brand,ACME Brand
ABC,Water Pump Style,Short
DEF,Brand,Engine Builders
DEF,Water Pump Style, Long


Vehicle Applicatins
ABC, 1999, Toyota, Camry, 3.1L
ABC, 2000, Toyota, Camry, 3.1L
DEF, 1997, Ford, Focus, 2.5L
DEF, 1998, Ford, Focus, 2.5L

I would like to be able to handle two things:

1. Give the person a list of all the unique years.  When they pick one, show 
them all the Makes for that year.  When they pick that, show all the Models.

Alternatively:
1. Give them a list of makes, then models, then engine, etc...

Also, it would be nice to if I could give Solr a Part#(Sku) and have it get all 
the attributes for that sku, alternatively, I'd love to be able to drill-down 
by the attributes such as Brand, Water Pump Style, etc.

Please let me know if this email is still not clear...



--
Vladimir Landman
Northern Auto Parts
 

From: Smiley, David W. [mailto:dsmi...@mitre.org] 
Sent: 2009-08-19 10:42 AM
To: solr; Vladimir Landman
Subject: Re: Data Modeling

This is the sort of Solr fundamentals question my book (chapter 2) will help 
you with.

Think about what your user interface is.  What are users searching for?  That 
is, what exactly comes back from search results?  It's not clear from your 
description what your search scenario is.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/19/09 10:31 AM, "Vladimir Landman"  wrote:
Hi,

I am trying to create a schema for Solr.   Here is a relational model of what 
our data might look like:

Inventory
-
Sku
Price
Weight

Attributes
---
AttributeName
AttributeValue

Applications
--
Id (Auto-Incrementing)
Sku
VehicleYear
VehicleMake
VehicleModel
VehicleEngine

There can be multiple Application(s) records.  Also, Attributes can also have 
duplicates.  Basically I want to store basic information about our inventory, 
attributes, and applications.  If I didn't have the applications,
I would simply have:








Since one part might have 3 or 4 attributes, but 100 applications, I want to 
try to avoid having 400 records, but maybe that is just what I will have to do.

I appreciate any help.
--
Vladimir Landman
Northern Auto Parts
 


Re: Adding cores dynamically

2009-08-19 Thread vivek sar
Lici,

  We're doing similar thing with multi-core - when a core reaches
capacity (in our case 200 million records) we start a new core. We are
doing this via web service call (Create web service),

  http://wiki.apache.org/solr/CoreAdmin

This is all done in java code - before writing we check the number of
records in core - if reached it's capacity we create a new core and
then index there.

-vivek



2009/8/19 Licinio Fernández Maurelo :
> Hi there,
>
> currently we want to add cores dynamically when the active one reaches
> some capacity,
> can anyone give me some hints to achieve such this functionality? (Just
> wondering if you have used shell-scripting or you have code some 100%
> Java based solution)
>
> Thx
>
>
> --
> Lici
>


Re: strange sorting results: each word in field is sorted

2009-08-19 Thread Erik Hatcher


On Aug 19, 2009, at 3:50 PM, Paul Rosen wrote:
I'm surprised you're not seeing an exception when trying to sort on  
title given this configuration.  Sorting must be done on single  
valued indexed fields, that have at most a single term indexed per  
document.  I recommend you use copyField to copy title to  
title_sort and configure a title_sort field as a "string" or a  
field type that analyzes only to a single term (like simply keyword  
tokenizing -> lower case filter.

   Erik


I want to double check this (since you probably remember how long it  
takes to recreate the indexes). I think you're saying to add these  
two lines, then re-index:






For the simplest case, yes.  You do have to be careful the sort field  
is not multiValued - and I believe the NINES model allowed for  
multiple titles.  So it might be necessary for your indexing client to  
specify the single sort field value instead of leveraging copyField.


Now, this is case-sensitive, right? So would this make it case- 
insensitive?


Yes, the above would be case sensitive.

sortMissingLast="true">

 
   
 

stored="true"/>




That  definition isn't quite right - you must have at least  
a tokenizer.  The KeywordTokenizer "tokenizes" the entire string into  
a single token, though.  In Solr's example schema there is a field  
type like this:


sortMissingLast="true" omitNorms="true">

  









  


Also, I'm guessing from seeing the current results that this  
wouldn't collate the characters with diacritical marks correctly. Is  
there a way to indicate that, for instance, A-grave would sort next  
to A?


Yes, you can incorporate the diacritic normalizing filter into the  
analyzer definition above.  AsciiFoldingFilter or the ISO Latin1 one.


And, while I'm on the subject, I have to do the same thing with the  
Author field, but unfortunately, that is sometimes "First Last" and  
sometimes "Last, First". Is there any way to sort those by last  
name, or do I just have to encourage the index people to be more  
consistent?


Good luck with getting consistency in your domain!  :)

But it certainly makes sense to request that from the data providers,  
in at least some form that can be turned into the sortable value.


I can think of a fairly simple algorithm, but am not sure where to  
implement it:


- if the word "and" or "&" appears, just look at the left side of  
the field (in other words, sort by the first name that appears.)
- if there is a comma, but it is part of ", jr." or some other  
common suffixes like that, ignore it.
- otherwise, if there is no comma, sort by the last word, unless it  
is "jr", "sr", "III", etc., then sort by the word before that.

- otherwise, sort by the first word.


Probably best to implement that in the indexing client code, but  
simple transformations could be implemented using the  
PatternReplaceFilter like above.


Erik



Re: Passing a Cookie in SolrJ

2009-08-19 Thread Lance Norskog
SolrJ uses the Apache Commons HTTP client. This describes the authentication
system:
http://hc.apache.org/httpclient-3.x/authentication.html


*This has code to use authentication*

https://issues.apache.org/jira/browse/SOLR-1238

You might be able to find an openSSO implementation for this. Or hack up a
simple one.

On Wed, Aug 19, 2009 at 5:48 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

>  On Tue, Aug 18, 2009 at 10:18 PM, Ramirez, Paul M (388J) <
> paul.m.rami...@jpl.nasa.gov> wrote:
>
> > Hi All,
> >
> > The project I am working on is using Solr and OpenSSO (Sun's single sign
> on
> > service). I need to write some sample code for our users that shows them
> how
> > to query Solr and I would just like to point them to the SolrJ
> documentation
> > but I can't see an easy way to be able to pass a cookie with the request.
> > The cookie is needed to be able to get through the SSO layer but will
> just
> > be ignored by Solr. I see that you are using Apache Commons Http Client
> and
> > with that I would be able to write the cookie if I had access to the
> > HttpMethod being used (GetMethod or PostMethod). However, I can not find
> an
> > easy way to get access to this with SolrJ and thought I would ask before
> > rewriting a simple example using only an ApacheHttpClient without the
> SolJ
> > library. Thanks in advance for any pointers you may have.
> >
>
> There's no easy way I think. You can extend CommonsHttpSolrServer and
> override the request method. Copy/paste the code from
> CommonsHttpSolrServer#request and make the changes. It is not an elegant
> way
> but it will work.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Lance Norskog
goks...@gmail.com


Re: Shutdown Solr

2009-08-19 Thread Lance Norskog
In production systems I have done a three-stage technique. First, use the
container's standard shutdown tool. Tomcat, JBoss, Jetty all have their
own. Then, sleep for maybe 60 seconds. Then do kill, sleep more, then 'kill
-9'.
On Wed, Aug 19, 2009 at 12:21 PM, Fuad Efendi  wrote:

> Thanks... "kill" should be / can be graceful; "kill -9" should kill
> immediately... no any hang, whole point...
>
> http://www.nabble.com/Is-kill--9-safe-or-not--td24866506.html
>
>
>
>
> -Original Message-
> From: ptomb...@gmail.com [mailto:ptomb...@gmail.com] On Behalf Of Paul
> Tomblin
> Sent: August-19-09 2:49 PM
> To: solr-user@lucene.apache.org
>  Subject: Re: Shutdown Solr
>
> On Wed, Aug 19, 2009 at 2:43 PM, Fuad Efendi wrote:
> > Most probably Ctrl-C is graceful for Tomcat, and kill -9 too... Tomcat is
> > smart... I prefer "/etc/init.d/my_tomcat" wrapper around catalina.sh ("su
> > tomcat", /var/lock etc...) - ok then, Graceful Shutdown depends on how
> you
> > started Tomcat.
>
> *No* application is graceful for "kill -9".  The whole point of "kill
> -9" is that it's uncatchable.
>
>
> --
> http://www.linkedin.com/in/paultomblin
>
>
>


-- 
Lance Norskog
goks...@gmail.com


Re: DataImportHandler ignoring most rows

2009-08-19 Thread Lance Norskog
It usually helps to make a database view of your query, and then load the
DIH from that view. There are cases where some query syntaxes are mangled on
the way to the DB.

2009/8/18 Noble Paul നോബിള്‍ नोब्ळ् 

> this comment says that
>   7
>
> the query fetched only 7 rows. If possible open a tool and just run
> the same query and see how many rows are returned
>
> On Wed, Aug 19, 2009 at 3:46 AM, Erik Earle wrote:
> > Using:
> > - apache-solr-1.3.0
> > - java 1.6
> > - tomcat 6
> > - sql server 2005 w/ JSQLConnect 4.0 driver
> >
> > I have a group table with 3007 rows.  I have confirmed the key is
> > unique with "select distinct id from group"  and it returns 3007.  When i
> re-index using http://host:port/solr/dataimport?command=full-import  I
> only get 7 records indexed.  Any insight into what is going on would be
> really great.
> >
> > A partial response:
> >
> >1
> >7
> >0
> >
> >
> > I have other entities that index all the rows without issue.
> >
> > There are no errors in the logs.
> >
> > I am not using any Transformers (and most of my config is not changed
> from install)
> >
> > My schema.xml contains:
> >
> > key
> >
> > and field defs (not a full list of fields):
> >required="true" />
> >required="true" />
> >   
> >   
> >   
> >   
> >
> > data-config.xml
> > 
> > 
> >
> > >driver="com.jnetdirect.jsql.JSQLDriver"
> >
>  
> url="jdbc:JSQLConnect://se-eriearle-lt1/database=SocialSite2/user=SocialSite2"
> >user="SocialSite2"
> >password="SocialSite2" />
> >
> >
> >
> >
> >
> > >query="select 'group.'+id as 'key', 'group' as 'class', name,
> handle, description, created, updated from group order by created asc">
> >
> >
> > >query="<...redacted...>">
> >
> >
> >
> > 
> >
> >
> >
> >
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>



-- 
Lance Norskog
goks...@gmail.com


Re: DataImportHandler ignoring most rows

2009-08-19 Thread erikea...@yahoo.com
I switched to the ms driver an now all is well.  Must be an  
incompatibility with the JSQLConnect driver.

Sent from my iPhone

On Aug 18, 2009, at 11:47 PM, Noble Paul നോബിള്‍  नो 
ब्ळ्  wrote:

> this comment says that
>   7
>
> the query fetched only 7 rows. If possible open a tool and just run
> the same query and see how many rows are returned
>
> On Wed, Aug 19, 2009 at 3:46 AM, Erik Earle  
> wrote:
>> Using:
>> - apache-solr-1.3.0
>> - java 1.6
>> - tomcat 6
>> - sql server 2005 w/ JSQLConnect 4.0 driver
>>
>> I have a group table with 3007 rows.  I have confirmed the key is
>> unique with "select distinct id from group"  and it returns 3007.   
>> When i re-index using http://host:port/solr/dataimport?command=full-import 
>>   I only get 7 records indexed.  Any insight into what is going on  
>> would be really great.
>>
>> A partial response:
>>
>>1
>>7
>>0
>>
>>
>> I have other entities that index all the rows without issue.
>>
>> There are no errors in the logs.
>>
>> I am not using any Transformers (and most of my config is not  
>> changed from install)
>>
>> My schema.xml contains:
>>
>> key
>>
>> and field defs (not a full list of fields):
>>   > required="true" />
>>   > required="true" />
>>   
>>   > stored="true" />
>>   
>>   
>>
>> data-config.xml
>> 
>> 
>>
>>>driver="com.jnetdirect.jsql.JSQLDriver"
>>url="jdbc:JSQLConnect://se-eriearle-lt1/database=SocialSite2/ 
>> user=SocialSite2"
>>user="SocialSite2"
>>password="SocialSite2" />
>>
>>
>>
>>
>>
>>>query="select 'group.'+id as 'key', 'group' as 'class',  
>> name, handle, description, created, updated from group order by  
>> created asc">
>>
>>
>>>query="<...redacted...>">
>>
>>
>>
>> 
>>
>>
>>
>>
>
>
>
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com






RE: Data Modeling

2009-08-19 Thread Smiley, David W.
It's getting clearer Vladimir.  So fundamentally your users are searching for 
products (apparently auto parts) and the different attributes would become 
navigation filters.  If this is right, then your initial schema (the first 
email) is a start, although it's a little ambigous to interpert it because "id" 
and "sku" are over-loaded.  Your schema would contain a part id, the part's 
sku, and for each "attribute" you mentioned it would have a field as well.  I 
recommend using Solr's dynamic fields to define those so that you don't have to 
explicitly define every attribute you'll ever think of for every part 
explicitly in the schema.   The word "application" was totally throwing me but 
now I believe you mean to say that this is a vehicle, and an auto part is going 
to work on multiple vehicles.  In Solr, you're going to denormalize this 
related data by inlining the auto information (aka "application") into the each 
document which is an auto part. ...

I think you have a couple approaches on that.

Firstly, I observe that when I'm shopping for autos or for auto parts, I am 
guided through a user interface to pick my precise vehicle.  THEN I see related 
products.  This is straight forward -- you would not use Solr; put this 
information in your database and build an easy app to navigate to a specific 
vehicle to get the vehicle identifier.  You *could* use Solr for this but it'd 
be in a separate index/core or you would have to use multiple document types in 
your schema (my book has more info on these approaches).  So once you have the 
vehicle identifier, you would look up documents in Solr (aka auto parts) that 
have have this vehicle identifier.  It's be a multi-valued untokenized field 
and this would be the only vehicle info needed in your schema.

The other approach would be necessary to dynamically filter a list of parts by 
*partial* vehicle choices like picking "Porsche" and "2001" would give you 
parts that will work on a Boxster and a Carerra made in 2001.  Doing this 
correctly is tricky for solr and it's non-relational schema because there are 
multiple vehicle attributes and an auto part is associated to multiple 
vehicles.  I'll advise more if you need to do this but hopefully you won't need 
to.  It's a bit advanced and complicated.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server

From: Vladimir Landman [v...@northernautoparts.com]
Sent: Wednesday, August 19, 2009 4:01 PM
To: solr-user@lucene.apache.org
Subject: FW: Data Modeling

I hit reply and sent this to just David, but I think it should go to the whole 
list:

Hi David,

I want to do 2 kinds of things with Solr  Maybe 3 in the future

1. I want to use  it on our website so that a customer can filter down products 
by different attributes.  So suppose we have:

Inventory
---
ABC, 10
DEF, 15
s
Attributes

ABC,Brand,ACME Brand
ABC,Water Pump Style,Short
DEF,Brand,Engine Builders
DEF,Water Pump Style, Long


Vehicle Applicatins
ABC, 1999, Toyota, Camry, 3.1L
ABC, 2000, Toyota, Camry, 3.1L
DEF, 1997, Ford, Focus, 2.5L
DEF, 1998, Ford, Focus, 2.5L

I would like to be able to handle two things:

1. Give the person a list of all the unique years.  When they pick one, show 
them all the Makes for that year.  When they pick that, show all the Models.

Alternatively:
1. Give them a list of makes, then models, then engine, etc...

Also, it would be nice to if I could give Solr a Part#(Sku) and have it get all 
the attributes for that sku, alternatively, I'd love to be able to drill-down 
by the attributes such as Brand, Water Pump Style, etc.

Please let me know if this email is still not clear...



--
Vladimir Landman
Northern Auto Parts


From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: 2009-08-19 10:42 AM
To: solr; Vladimir Landman
Subject: Re: Data Modeling

This is the sort of Solr fundamentals question my book (chapter 2) will help 
you with.

Think about what your user interface is.  What are users searching for?  That 
is, what exactly comes back from search results?  It's not clear from your 
description what your search scenario is.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/19/09 10:31 AM, "Vladimir Landman"  wrote:
Hi,

I am trying to create a schema for Solr.   Here is a relational model of what 
our data might look like:

Inventory
-
Sku
Price
Weight

Attributes
---
AttributeName
AttributeValue

Applications
--
Id (Auto-Incrementing)
Sku
VehicleYear
VehicleMake
VehicleModel
VehicleEngine

There can be multiple Application(s) records.  Also, Attributes can also have 
duplicates.  Basically I want to store basic information about our inventory, 
attributes, and applications.  If I didn't have the applications,
I would simply have:








Since one part might have 3 or 4 attributes, but 100 app

Re: dynamic changes to schema

2009-08-19 Thread Marco Westermann

Hi, thanks for your answers, I think I have to go more in deatail.

we are talking about a shop-application which have products I want to 
search for. This products normally have the standard attributes like 
sku, a name, a price and so on. But the user can add attributes to the 
product. So for example if he sells books, he could add the author as 
attribute. Lets say he name this field my_author (but he is free to name 
it as he wants) and he tells this field over  the configuration, that it 
is searchable. So I need a field in solr for the author. Cause I cant 
restrict the user to prefix every field with something like my_ dynamic 
fields doesn't work, do they?


best,
Marco

Constantijn Visinescu schrieb:

huh? I think I lost you :)
You want to use a multivalued field to list what dynamic fields you have in
your document?

Also if you program your application correctly you should be able to
restrict your users from doing anything you please (or don't please in this
case).


On Tue, Aug 18, 2009 at 11:38 PM, Marco Westermann  wrote:

  

hi,

thanks for the advise but the problem with dynamic fields is, that i cannot
restrict how the user calls the field in the application. So there isn't a
pattern I can use. But I thought about using mulitvalued fields for the
dynamically added fields. Good Idea?

thanks,
Marco

Constantijn Visinescu schrieb:



use a dynamic field ?

On Tue, Aug 18, 2009 at 5:09 PM, Marco Westermann 
wrote:



  

Hi there,

is there a possibility to change the solr-schema over php dynamically.
The
web-application I want to index at the moment has the feature to add
fields
to entitys and you can tell this fields that they are searchable. To
realize
this with solr the schema has to change when a searchable field is added
or
removed.

Any suggestions,

Thanks a lot,

Marco Westermann

--
++ Business-Software aus einer Hand ++
++ Internet, Warenwirtschaft, Linux, Virtualisierung ++
http://www.intersales.de
http://www.eisxen.org
http://www.tarantella-partner.de
http://www.medisales.de
http://www.eisfair.net

interSales AG Internet Commerce
Subbelrather Str. 247
50825 Köln

Tel  02 21 - 27 90 50
Fax  02 21 - 27 90 517
Mail i...@intersales.de
Mail m...@intersales.de
Web  www.intersales.de

Handelsregister Köln HR B 30904
Ust.-Id.: DE199672015
Finanzamt Köln-Nord. UstID: nicht vergeben
Aufsichtsratsvorsitzender: Michael Morgenstern
Vorstand: Andrej Radonic, Peter Zander






__ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank-Version
4346 (20090818) __

E-Mail wurde geprüft mit ESET NOD32 Antivirus.

http://www.eset.com




  

--
++ Business-Software aus einer Hand ++
++ Internet, Warenwirtschaft, Linux, Virtualisierung ++
http://www.intersales.de
http://www.eisxen.org
http://www.tarantella-partner.de
http://www.medisales.de
http://www.eisfair.net

interSales AG Internet Commerce
Subbelrather Str. 247
50825 Köln

Tel  02 21 - 27 90 50
Fax  02 21 - 27 90 517
Mail i...@intersales.de
Mail m...@intersales.de
Web  www.intersales.de

Handelsregister Köln HR B 30904
Ust.-Id.: DE199672015
Finanzamt Köln-Nord. UstID: nicht vergeben
Aufsichtsratsvorsitzender: Michael Morgenstern
Vorstand: Andrej Radonic, Peter Zander






__ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank-Version 4346 
(20090818) __

E-Mail wurde geprüft mit ESET NOD32 Antivirus.

http://www.eset.com


  



--
++ Business-Software aus einer Hand ++
++ Internet, Warenwirtschaft, Linux, Virtualisierung ++
http://www.intersales.de
http://www.eisxen.org
http://www.tarantella-partner.de
http://www.medisales.de
http://www.eisfair.net

interSales AG Internet Commerce
Subbelrather Str. 247
50825 Köln

Tel  02 21 - 27 90 50
Fax  02 21 - 27 90 517
Mail i...@intersales.de
Mail m...@intersales.de
Web  www.intersales.de

Handelsregister Köln HR B 30904
Ust.-Id.: DE199672015
Finanzamt Köln-Nord. UstID: nicht vergeben
Aufsichtsratsvorsitzender: Michael Morgenstern
Vorstand: Andrej Radonic, Peter Zander 



Re: dynamic changes to schema

2009-08-19 Thread Erik Hatcher
However, you can have a dynamic "*" field mapping that catches all  
field names that aren't already defined - though all of the fields  
will be the same field type.


Erik



On Aug 19, 2009, at 5:48 PM, Marco Westermann wrote:


Hi, thanks for your answers, I think I have to go more in deatail.

we are talking about a shop-application which have products I want  
to search for. This products normally have the standard attributes  
like sku, a name, a price and so on. But the user can add attributes  
to the product. So for example if he sells books, he could add the  
author as attribute. Lets say he name this field my_author (but he  
is free to name it as he wants) and he tells this field over  the  
configuration, that it is searchable. So I need a field in solr for  
the author. Cause I cant restrict the user to prefix every field  
with something like my_ dynamic fields doesn't work, do they?


best,
Marco

Constantijn Visinescu schrieb:

huh? I think I lost you :)
You want to use a multivalued field to list what dynamic fields you  
have in

your document?

Also if you program your application correctly you should be able to
restrict your users from doing anything you please (or don't please  
in this

case).


On Tue, Aug 18, 2009 at 11:38 PM, Marco Westermann  
 wrote:




hi,

thanks for the advise but the problem with dynamic fields is, that  
i cannot
restrict how the user calls the field in the application. So there  
isn't a
pattern I can use. But I thought about using mulitvalued fields  
for the

dynamically added fields. Good Idea?

thanks,
Marco

Constantijn Visinescu schrieb:



use a dynamic field ?

On Tue, Aug 18, 2009 at 5:09 PM, Marco Westermann  


wrote:





Hi there,

is there a possibility to change the solr-schema over php  
dynamically.

The
web-application I want to index at the moment has the feature to  
add

fields
to entitys and you can tell this fields that they are  
searchable. To

realize
this with solr the schema has to change when a searchable field  
is added

or
removed.

Any suggestions,

Thanks a lot,

Marco Westermann



【solr DIH】A problem about solr delta-imports

2009-08-19 Thread huenzhao

Hi all,

There is a problem when I use solr delta-imports to update the index. I have
added the "last_modified" column in the table. After I use the "full-import"
command to index the database data, the "dataimport.properties" file
contains nothing, and when I use the "delta-import" command to update index,
the solr list all the data in database not the lasted data. My
db-data-config.xml: 


  
 



   
   

   
   

   

 






Anyboby know how to solve the problem? Thanks!

enzhao...@gmail.com


-- 
View this message in context: 
http://www.nabble.com/%E3%80%90solr-DIH%E3%80%91A-problem-about-solr-delta-imports-tp25055788p25055788.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: 【solr DIH】A problem about solr delta-imports

2009-08-19 Thread Noble Paul നോബിള്‍ नोब्ळ्
which version of solr are you using? .Solr1.3 had a bug with this.

On Thu, Aug 20, 2009 at 9:42 AM, huenzhao wrote:
>
> Hi all,
>
> There is a problem when I use solr delta-imports to update the index. I have
> added the "last_modified" column in the table. After I use the "full-import"
> command to index the database data, the "dataimport.properties" file
> contains nothing, and when I use the "delta-import" command to update index,
> the solr list all the data in database not the lasted data. My
> db-data-config.xml:
>
> 
>     url="jdbc:mysql://localhost:3306/funguide" user="root" password="root"/>
>     
>                        deltaQuery="select shop_id from shop where last_modified >
> '${dataimporter.last_index_time}'">
>
>                        
>            
>                        
>                        
>                        
>                        
>                        
>
>        
>    
> 
>
> Anyboby know how to solve the problem? Thanks!
>
> enzhao...@gmail.com
>
>
> --
> View this message in context: 
> http://www.nabble.com/%E3%80%90solr-DIH%E3%80%91A-problem-about-solr-delta-imports-tp25055788p25055788.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: 【solr DIH】A problem about solr delta-imports

2009-08-19 Thread huenzhao

The version is 1.3.
After I used the full-import, the tomcat log show that the solr did not call
the SolrWriter class.

Do you know the solution of this bug?




Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> which version of solr are you using? .Solr1.3 had a bug with this.
> 
> On Thu, Aug 20, 2009 at 9:42 AM, huenzhao wrote:
>>
>> Hi all,
>>
>> There is a problem when I use solr delta-imports to update the index. I
>> have
>> added the "last_modified" column in the table. After I use the
>> "full-import"
>> command to index the database data, the "dataimport.properties" file
>> contains nothing, and when I use the "delta-import" command to update
>> index,
>> the solr list all the data in database not the lasted data. My
>> db-data-config.xml:
>>
>> 
>>    > url="jdbc:mysql://localhost:3306/funguide" user="root" password="root"/>
>>     
>>        >                deltaQuery="select shop_id from shop where last_modified >
>> '${dataimporter.last_index_time}'">
>>
>>                        
>>            
>>                        
>>                        
>>                        
>>                        
>>                        
>>
>>        
>>    
>> 
>>
>> Anyboby know how to solve the problem? Thanks!
>>
>> enzhao...@gmail.com
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/%E3%80%90solr-DIH%E3%80%91A-problem-about-solr-delta-imports-tp25055788p25055788.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 
> 

-- 
View this message in context: 
http://www.nabble.com/%E3%80%90solr-DIH%E3%80%91A-problem-about-solr-delta-imports-tp25055788p25056379.html
Sent from the Solr - User mailing list archive at Nabble.com.



Is wildcard search not correctly analyzed at query?

2009-08-19 Thread Alexander Herzog
Hi all

sorry for the long post

We are switching from indexdata's zebra to solr for a new book
archival/preservation project with multiple languages, so expect more
questions soon (sorry for that)
The features of solr are pretty cool and more or less overwhelming!

But there is one thing I found after a little test with wildcards.

I'm using the latest svn build and didn't change anything except the
schema.xml
Solr Specification Version: 1.3.0.2009.08.20.07.53.52
Solr Implementation Version: 1.4-dev 806060 - ait015 - 2009-08-20 07:53:52
Lucene Specification Version: 2.9-dev
Lucene Implementation Version: 2.9-dev 804692 - 2009-08-16 09:33:41

I have a text_ws field with this schema config:


   
  
  
  
   

...
and I added a dynamic field for everything since I'm not sure what field
we will use...


...


So I ed this content:
...

   X, 143, XIV S.:
   124 feine Farbendrucktafeln mit über 600 Abbildungen;
   24,5 cm.

...

since it's German, and I couldn't find a tokenizer for German compound
words (any help appreciated) I wanted to search for 'Farb*'

The final row of the query analyzer in the admin section told me:
farb*
for the content:
x,  143,xiv s.: 124 feine   farbendrucktafeln   mit 
uber600 abbildungen;
24,5cm.

so everything seems to be ok, everything in lower case

Now, for the rest service:
http://localhost:8983/solr/select/?q=PhysicalDescription:Farb*&debugQuery=true
PhysicalDescription:Farb*
PhysicalDescription:Farb*
PhysicalDescription:Farb*
PhysicalDescription:Farb*

Since Farb* has a capital letter, nothing is found.
When using farb* as query, I get the result.

Where can I add/change a query anaylizer that "lower cases" wildcard
searches?

thanks, best wishes,
Alexander


Re: dynamic changes to schema

2009-08-19 Thread Constantijn Visinescu
There's that or you can just change the user entered "my_author" field into
"my_author_customattribute" in code after the user has entered it and add a
*_customattribute to your schema.

you'd have to add the postfix in code also at querytime and off you go.

Constantijn

On Wed, Aug 19, 2009 at 11:52 PM, Erik Hatcher wrote:

> However, you can have a dynamic "*" field mapping that catches all field
> names that aren't already defined - though all of the fields will be the
> same field type.
>
>Erik
>
>
>
>
> On Aug 19, 2009, at 5:48 PM, Marco Westermann wrote:
>
>  Hi, thanks for your answers, I think I have to go more in deatail.
>>
>> we are talking about a shop-application which have products I want to
>> search for. This products normally have the standard attributes like sku, a
>> name, a price and so on. But the user can add attributes to the product. So
>> for example if he sells books, he could add the author as attribute. Lets
>> say he name this field my_author (but he is free to name it as he wants) and
>> he tells this field over  the configuration, that it is searchable. So I
>> need a field in solr for the author. Cause I cant restrict the user to
>> prefix every field with something like my_ dynamic fields doesn't work, do
>> they?
>>
>> best,
>> Marco
>>
>> Constantijn Visinescu schrieb:
>>
>>> huh? I think I lost you :)
>>> You want to use a multivalued field to list what dynamic fields you have
>>> in
>>> your document?
>>>
>>> Also if you program your application correctly you should be able to
>>> restrict your users from doing anything you please (or don't please in
>>> this
>>> case).
>>>
>>>
>>> On Tue, Aug 18, 2009 at 11:38 PM, Marco Westermann 
>>> wrote:
>>>
>>>
>>>  hi,

 thanks for the advise but the problem with dynamic fields is, that i
 cannot
 restrict how the user calls the field in the application. So there isn't
 a
 pattern I can use. But I thought about using mulitvalued fields for the
 dynamically added fields. Good Idea?

 thanks,
 Marco

 Constantijn Visinescu schrieb:


  use a dynamic field ?
>
> On Tue, Aug 18, 2009 at 5:09 PM, Marco Westermann 
> wrote:
>
>
>
>
>  Hi there,
>>
>> is there a possibility to change the solr-schema over php dynamically.
>> The
>> web-application I want to index at the moment has the feature to add
>> fields
>> to entitys and you can tell this fields that they are searchable. To
>> realize
>> this with solr the schema has to change when a searchable field is
>> added
>> or
>> removed.
>>
>> Any suggestions,
>>
>> Thanks a lot,
>>
>> Marco Westermann
>>
>>