Couple of questions on solr

2008-06-12 Thread Sachin
[Reposting cause for some reasons I can't find this on the list, so apologies 
for the double post]
Hi All,

I am quite new to solr and trying to use solr with a .net web site (!). So far, 
solr 
hasn't given me any major jitters but I've been stuck with few things off-late; 
hopefully; I can get them answered over here.

1) Overview: Currently, we have around 20,000 documents to index with 
individual doc size around 5k. We have set up faceting on a multi-valued field 
(there will be ~20 facets per document).
2) Faceted navigation: I've read that faceted navigation on a multi-valued 
field has some performance implications. Unfortunately; the current site 
requires multi-valued faceting and I cannot break them into unique fields for 
faceting. What is the best way to get 
maximum performance on a multi-valued field?
3) We support three kinds of search on our site: a free text search, the 
faceted navigation and a negation search. For free text search, we have created 
a text field (as defined in example schema.xml) and use copyField to copy other 
fields to this, the free text search happens on this text field only, for 
example, say a doc can have two fields: 
title and description, both of these are copied to text field using copyField  
to this text field and free text search happens on this field. Is there any 
way, that I can assign weights for the relevancy search, if I am using this new 
field i.e. if the user 
searches for "sunny" and there is a document (doc1) with title as "sunny 
something" and another document (doc2) with description as "sunny description", 
is it possible to return doc1 before doc2; given that the search is happening 
on the copied field? If that's not possible is there any other way that this 
can be achieved (please keep in mind that we have around 8 fields for a 
document with 4 of them being multi-valued, so searching explicitly on all of 
them and assign boost at query time might not be very funny). Can I use 
DismaxHandler for this kind of copyField field, if yes, is there any way that I 
can define Dismaxhandler as the default handler in solrconfig.xml, without 
having to explicitly provide it during query time?

Thanks in advance,
Sachin


Smart Girls Secret Weapon
Read Unbiased Beauty Product Reviews, Get Helpful Tips, Tricks and Sam
http://thirdpartyoffers.netzero.net/TGL2231/fc/JKFkuJO6p0C525RM1AF9yqNIvskd7xVfuxfTXAn9eee8K4yeOkBKLP/


Re: Problem with add a XML

2008-06-12 Thread Thomas Lauer

Yes my file is UTF-8. I Have Upload my file.




Grant Ingersoll-6 wrote:
> 
> 
> On Jun 11, 2008, at 3:46 AM, Thomas Lauer wrote:
> 
>> now I want tho add die files to solr. I have start solr on windows
>> in the example directory with java -jar start.jar
>>
>>
>> I have the following Error Message:
>>
>> C:\test\output>java -jar post.jar *.xml
>> SimplePostTool: version 1.2
>> SimplePostTool: WARNING: Make sure your XML documents are encoded in  
>> UTF-8, other encodings are not currently supported
> 
> 
> This is your issue right here.  You have to save that second file in  
> UTF-8.
> 
>>
>> SimplePostTool: POSTing files to http://localhost:8983/solr/update..
>> SimplePostTool: POSTing file 1.xml
>> SimplePostTool: POSTing file 2.xml
>> SimplePostTool: FATAL: Connection error (is Solr running at
>> http://localhost:8983/solr/update 
>>  ?): java.io.IOException: S
>> erver returned HTTP response code: 400 for URL:
>> http://localhost:8983/solr/update
>>
>> C:\test\output>
>>
>> Regards Thomas Lauer
>>
>>
>>
>>
>>
>> __ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank- 
>> Version 3175 (20080611) __
>>
>> E-Mail wurde geprüft mit ESET NOD32 Antivirus.
>>
>> http://www.eset.com
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com
> 
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
> 
> 
> 
> 
> 
> 
> 
> 
> 
http://www.nabble.com/file/p17794387/2.xml 2.xml 
-- 
View this message in context: 
http://www.nabble.com/Problem-with-add-a-XML-tp17772018p17794387.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: synonym token types and ranking

2008-06-12 Thread Uri Boness
yes... I actually implemented it. I'll just clean up the code and add it to
JIRA.

Uri

On Thu, Jun 12, 2008 at 5:48 AM, Otis Gospodnetic <
[EMAIL PROTECTED]> wrote:

> Hi Uri,
>
> Yes, I think that would make sense (word vs. synonym token types).  Custom
> boosting/weighting of original token vs. synonym token(s) also makes sense.
>  Is this something you can provide a patch for?
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
> - Original Message 
> > From: Uri Boness <[EMAIL PROTECTED]>
> > To: solr-user@lucene.apache.org
> > Sent: Wednesday, June 11, 2008 8:56:02 PM
> > Subject: synonym token types and ranking
> >
> > Hi,
> >
> > I've noticed that currently the SynonymFilter replaces the original
> > token with the configured tokens list (which includes the original
> > matched token) and each one of these tokens is of type "word". Wouldn't
> > it make more sense to only mark the original token as type "word" and
> > the the other tokens as "synonym" types? In addition, once payloads are
> > integrated with Solr, it would be nice if it would be possible to
> > configure a payload for synonyms. One of the requirements we're
> > currently facing in our project is that matches on synonyms should weigh
> > less than exact matches.
> >
> > cheers,
> > Uri
>
>


Re: Problem with add a XML

2008-06-12 Thread Jón Helgi Jónsson
Usually you get better error messages from the start.jar console, you
don't see anything there?

On Thu, Jun 12, 2008 at 7:49 AM, Thomas Lauer <[EMAIL PROTECTED]> wrote:
>
> Yes my file is UTF-8. I Have Upload my file.
>
>
>
>
> Grant Ingersoll-6 wrote:
>>
>>
>> On Jun 11, 2008, at 3:46 AM, Thomas Lauer wrote:
>>
>>> now I want tho add die files to solr. I have start solr on windows
>>> in the example directory with java -jar start.jar
>>>
>>>
>>> I have the following Error Message:
>>>
>>> C:\test\output>java -jar post.jar *.xml
>>> SimplePostTool: version 1.2
>>> SimplePostTool: WARNING: Make sure your XML documents are encoded in
>>> UTF-8, other encodings are not currently supported
>>
>>
>> This is your issue right here.  You have to save that second file in
>> UTF-8.
>>
>>>
>>> SimplePostTool: POSTing files to http://localhost:8983/solr/update..
>>> SimplePostTool: POSTing file 1.xml
>>> SimplePostTool: POSTing file 2.xml
>>> SimplePostTool: FATAL: Connection error (is Solr running at
>>> http://localhost:8983/solr/update
>>>  ?): java.io.IOException: S
>>> erver returned HTTP response code: 400 for URL:
>>> http://localhost:8983/solr/update
>>>
>>> C:\test\output>
>>>
>>> Regards Thomas Lauer
>>>
>>>
>>>
>>>
>>>
>>> __ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank-
>>> Version 3175 (20080611) __
>>>
>>> E-Mail wurde geprüft mit ESET NOD32 Antivirus.
>>>
>>> http://www.eset.com
>>
>> --
>> Grant Ingersoll
>> http://www.lucidimagination.com
>>
>> Lucene Helpful Hints:
>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>
>>
>>
>>
>>
>>
>>
>>
>>
> http://www.nabble.com/file/p17794387/2.xml 2.xml
> --
> View this message in context: 
> http://www.nabble.com/Problem-with-add-a-XML-tp17772018p17794387.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Couple of questions on solr

2008-06-12 Thread Otis Gospodnetic
Hi,

The answer for 3) is:
Use DisMax request handler.  In solrconfig.xml assign weights/boosts to 
different fields.  No need to use copyField then, as you can search multiple 
fields with DisMax by just specifying them in the solrconfig.xml.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: Sachin <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, June 12, 2008 3:06:31 AM
> Subject: Couple of questions on solr
> 
> [Reposting cause for some reasons I can't find this on the list, so apologies 
> for the double post]
> Hi All,
> 
> I am quite new to solr and trying to use solr with a .net web site (!). So 
> far, 
> solr 
> hasn't given me any major jitters but I've been stuck with few things 
> off-late; 
> hopefully; I can get them answered over here.
> 
> 1) Overview: Currently, we have around 20,000 documents to index with 
> individual 
> doc size around 5k. We have set up faceting on a multi-valued field (there 
> will 
> be ~20 facets per document).
> 2) Faceted navigation: I've read that faceted navigation on a multi-valued 
> field 
> has some performance implications. Unfortunately; the current site requires 
> multi-valued faceting and I cannot break them into unique fields for 
> faceting. 
> What is the best way to get 
> maximum performance on a multi-valued field?
> 3) We support three kinds of search on our site: a free text search, the 
> faceted 
> navigation and a negation search. For free text search, we have created a 
> text 
> field (as defined in example schema.xml) and use copyField to copy other 
> fields 
> to this, the free text search happens on this text field only, for example, 
> say 
> a doc can have two fields: 
> title and description, both of these are copied to text field using copyField 
>  
> to this text field and free text search happens on this field. Is there any 
> way, 
> that I can assign weights for the relevancy search, if I am using this new 
> field 
> i.e. if the user 
> searches for "sunny" and there is a document (doc1) with title as "sunny 
> something" and another document (doc2) with description as "sunny 
> description", 
> is it possible to return doc1 before doc2; given that the search is happening 
> on 
> the copied field? If that's not possible is there any other way that this can 
> be 
> achieved (please keep in mind that we have around 8 fields for a document 
> with 4 
> of them being multi-valued, so searching explicitly on all of them and assign 
> boost at query time might not be very funny). Can I use DismaxHandler for 
> this 
> kind of copyField field, if yes, is there any way that I can define 
> Dismaxhandler as the default handler in solrconfig.xml, without having to 
> explicitly provide it during query time?
> 
> Thanks in advance,
> Sachin
> 
> 
> Smart Girls Secret Weapon
> Read Unbiased Beauty Product Reviews, Get Helpful Tips, Tricks and Sam
> http://thirdpartyoffers.netzero.net/TGL2231/fc/JKFkuJO6p0C525RM1AF9yqNIvskd7xVfuxfTXAn9eee8K4yeOkBKLP/



Re: Strategy for presenting fresh data

2008-06-12 Thread Norberto Meijome
On Wed, 11 Jun 2008 22:13:24 -0700 (PDT)
rohit arora <[EMAIL PROTECTED]> wrote:

> I am new to Solr Lucene I have only one defaule core i am working on creating 
> multiple core. 
> Can you help me in this matter.

hi Rohit,
please do NOT hijack the thread. You are far more likely to get useful, helpful 
answers if you state your question on a new email, with an appropriate subject.

thanks,
B
_
{Beto|Norberto|Numard} Meijome

"Some cause happiness wherever they go; others, whenever they go."
  Oscar Wilde

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Strategy for presenting fresh data

2008-06-12 Thread Norberto Meijome
On Wed, 11 Jun 2008 20:49:54 -0700 (PDT)
Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

> Hi James,
> 
> Yes, this makes sense.  I've recommended doing the same to others before.  It 
> would be good to have this be a part of Solr.  There is one person (named 
> Jason) working on adding more real-time search support to both Lucene and 
> Solr.

v. interesting - do you have any pointers handy on this?

In the meantime, I had imagined that, although clumsy,  federated search could 
be used for this purpose - posting the new documents to a group of servers 
('latest updates servers') with v limited amount of documents with v. fast 
"reload / refresh" times, and sending them again (on a work queue, possibly), 
to the 'core servers'. Regularly cleaning the 'latest updates servers' of the 
already posted documents to 'core servers' would keep them lean...  of course, 
this approach sucks compared to a proper solution like what James is suggesting 
:)

B
_
{Beto|Norberto|Numard} Meijome

"Ugly programs are like ugly suspension bridges: they're much more liable to 
collapse than pretty ones, because the way humans (especially engineer-humans) 
perceive beauty is intimately related to our ability to process and understand  
complexity. A language that makes it hard to write elegant code makes it hard 
to write good code."
   Eric Raymond

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: searching only within allowed documents

2008-06-12 Thread Geoffrey Young



climbingrose wrote:

It depends on your query. The second query is better if you know that
fieldb:bar filtered query will be reused often since it will be cached
separately from the query. The first query occuppies one cache entry while
the second one occuppies two cache entries, one in queryCache and one in
filteredCache. Therefore, if you're not going to reuse fieldb:bar, the
second query is better.


ok, that makes more sense.  thanks.

--Geoff


Error loading class 'solr.RandomSortField'

2008-06-12 Thread rohit arora

Hi,

I configured multi core in Solr Lucene but while giving "java -jar start.jar" 
command it through an Error

"Caused by: java.lang.ClassNotFoundException: solr.RandomSortField"

can you help me in this problem.

with regards
 Rohit Arora



  

Re: Strategy for presenting fresh data

2008-06-12 Thread Otis Gospodnetic
What you are describing is pretty much what the original poster intends to do, 
as far as I understand.


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: Norberto Meijome <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, June 12, 2008 9:13:39 AM
> Subject: Re: Strategy for presenting fresh data
> 
> On Wed, 11 Jun 2008 20:49:54 -0700 (PDT)
> Otis Gospodnetic wrote:
> 
> > Hi James,
> > 
> > Yes, this makes sense.  I've recommended doing the same to others before.  
> > It 
> would be good to have this be a part of Solr.  There is one person (named 
> Jason) 
> working on adding more real-time search support to both Lucene and Solr.
> 
> v. interesting - do you have any pointers handy on this?
> 
> In the meantime, I had imagined that, although clumsy,  federated search 
> could 
> be used for this purpose - posting the new documents to a group of servers 
> ('latest updates servers') with v limited amount of documents with v. fast 
> "reload / refresh" times, and sending them again (on a work queue, possibly), 
> to 
> the 'core servers'. Regularly cleaning the 'latest updates servers' of the 
> already posted documents to 'core servers' would keep them lean...  of 
> course, 
> this approach sucks compared to a proper solution like what James is 
> suggesting 
> :)
> 
> B
> _
> {Beto|Norberto|Numard} Meijome
> 
> "Ugly programs are like ugly suspension bridges: they're much more liable to 
> collapse than pretty ones, because the way humans (especially 
> engineer-humans) 
> perceive beauty is intimately related to our ability to process and 
> understand  
> complexity. A language that makes it hard to write elegant code makes it hard 
> to 
> write good code."
>Eric Raymond
> 
> I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
> Reading disclaimers makes you go blind. Writing them is worse. You have been 
> Warned.



Re: Strategy for presenting fresh data

2008-06-12 Thread James Brady


In the meantime, I had imagined that, although clumsy,  federated  
search could
be used for this purpose - posting the new documents to a group of  
servers
('latest updates servers') with v limited amount of documents with  
v. fast
"reload / refresh" times, and sending them again (on a work queue,  
possibly), to
the 'core servers'. Regularly cleaning the 'latest updates servers'  
of the
already posted documents to 'core servers' would keep them lean...   
of course,
this approach sucks compared to a proper solution like what James  
is suggesting

:)




Otis - is there an issue I should be looking at for more information  
on this?


Yes, in principle, sending updates both to a fresh, forgetful and fast  
index and a larger, slower index is what I'm thinking of doing.


The only difference is that I'm talking about having the fresh index  
be implemented as a RAMDirectory in the same JVM as the large index.


This means that I can avoid the slowness of cross-disk or cross- 
machine replication, I can avoid having to index all documents in two  
places and I cut out the extra moving part of federated search.


On the other hand, I am going to have to write my own piece to handle  
the index flushes and federate searches to the fast and large indices.


Thanks for your input!
James

Re: Error loading class 'solr.RandomSortField'

2008-06-12 Thread Shalin Shekhar Mangar
Hi Rohit,

It seems like you may be using a Solr 1.2 war file. You must use the 1.3
nightly builds to use the newer multicore features.

http://people.apache.org/builds/lucene/solr/nightly/

On Thu, Jun 12, 2008 at 2:03 PM, rohit arora <[EMAIL PROTECTED]>
wrote:

>
> Hi,
>
> I configured multi core in Solr Lucene but while giving "java -jar
> start.jar" command it through an Error
>
> "Caused by: java.lang.ClassNotFoundException: solr.RandomSortField"
>
> can you help me in this problem.
>
> with regards
>  Rohit Arora
>
>
>
>




-- 
Regards,
Shalin Shekhar Mangar.


Re: Analytics e.g. "Top 10 searches"

2008-06-12 Thread Alexander Ramos Jardim
I keep this information on a separate index that I call moreSearchedWords. I
use it to generate tag clouds

2008/6/6 Matthew Runo <[EMAIL PROTECTED]>:

> I'm nearly certain that everyone who maintains these stats does it
> themselves in their 'front end'. It's very easy to log terms and whatever
> else just before or after sending the query off to Solr.
>
> Thanks!
>
> Matthew Runo
> Software Developer
> Zappos.com
> 702.943.7833
>
>
> On Jun 6, 2008, at 3:51 AM, McBride, John wrote:
>
>
>> Hello,
>>
>> Is anybody familiar with any SOLR-based analytical tools which would
>> allow us to extract "top ten seaches", for example.
>>
>> I imagine at the query parse level, where the query is tokenized and
>> filtered would be the best place to log this, due to the many
>> permutations possible at the user input level.
>>
>> Is there an existing plugin to do this, or could you suggest how to
>> architect this?
>>
>> Thanks,
>> John
>>
>>
>


-- 
Alexander Ramos Jardim


Re: Re[2]: "null" in admin page

2008-06-12 Thread Alexander Ramos Jardim
Sorry for the late response. Too many messages, I got distracted.

Steps as follow:
1. I download the solr example app.
2. Unpack it.
3. cd 
4. java -jar start.jar
5. Try do use one of the links in admin webapp
6. Get core=null

2008/5/30 Chris Hostetter <[EMAIL PROTECTED]>:

>
> : It surely comes on the example, as I got this problem all times I get the
> : example, and I have to remove the file multicore.xml or I get the error.
>
> something is wrong then.  if yo uare runing "java -jar start.jar" in the
> "example" directory then "example/solr" will be used as your solr home
> directory, and it has no multicore configuration files.
> the directory "example/multicore" does contain a multicore.xml file,
> and if you use that directory as your Solr Home then Solr will see that
> file and go into "MultiCore" -- but you you shouldn't have to remove that
> multicore.xml file unless you are explicitly using "example/multicore".
>
> If you are seeing differnet behavior, can you descibe in more details what
> steps you are taking (from a clean checkout) and why you think
> multicore.xml is getting used when you do those steps?
>
>
> -Hoss
>
>


-- 
Alexander Ramos Jardim


Re: Re: Analytics e.g. "Top 10 searches"

2008-06-12 Thread Alexander Ramos Jardim
Hello Jon,

These are the fields in my search index:
  --
Where on the site this search was made
  -- Search
text
  -- Number of
times this search was made

How it works:
1. When I someone hits the search functionality I put the search made on a
JMS to process searches statistics asynchronously.
2. The search information in the JMS is read in short time intervals and
condensed. This way I get beans that contains exactly the information that I
want to put on the index.
3. I retrieve all the X most executed searches sorted by hits and update
their information using the one I got in (2).
4. I empty the index.
5. I update the search index using the information generated in (3).

2008/6/12 Jon Lehto <[EMAIL PROTECTED]>:

> Are you doing anything 'fancy'?
>
> Thanks,
> Jon
>
>
> =
> From: Alexander Ramos Jardim <[EMAIL PROTECTED]>
> Date: 2008/06/12 Thu PM 01:23:22 EDT
> To: solr-user@lucene.apache.org
> Subject: Re: Analytics e.g. "Top 10 searches"
>
> I keep this information on a separate index that I call moreSearchedWords.
> I
> use it to generate tag clouds
>
> 2008/6/6 Matthew Runo <[EMAIL PROTECTED]>:
>
> > I'm nearly certain that everyone who maintains these stats does it
> > themselves in their 'front end'. It's very easy to log terms and whatever
> > else just before or after sending the query off to Solr.
> >
> > Thanks!
> >
> > Matthew Runo
> > Software Developer
> > Zappos.com
> > 702.943.7833
> >
> >
> > On Jun 6, 2008, at 3:51 AM, McBride, John wrote:
> >
> >
> >> Hello,
> >>
> >> Is anybody familiar with any SOLR-based analytical tools which would
> >> allow us to extract "top ten seaches", for example.
> >>
> >> I imagine at the query parse level, where the query is tokenized and
> >> filtered would be the best place to log this, due to the many
> >> permutations possible at the user input level.
> >>
> >> Is there an existing plugin to do this, or could you suggest how to
> >> architect this?
> >>
> >> Thanks,
> >> John
> >>
> >>
> >
>
>
> --
> Alexander Ramos Jardim
>
>


-- 
Alexander Ramos Jardim


Re: Problem with add a XML

2008-06-12 Thread Yonik Seeley
You need to define fields in the schema.xml (and otherwise change the
schema to match your data).
-Yonik

On Wed, Jun 11, 2008 at 3:46 AM, Thomas Lauer <[EMAIL PROTECTED]> wrote:
> 
> 
>  
>85f4fdf9-e596-4974-a5b9-57778e38067b
>143885
>28.10.2005 13:06:15
>Rechnung 2005-025235
>Rechnungsduplikate
>2002
>330T.doc
>KIS
>Bonow
>25906
>Hofma GmbH
>Mandant
>  
> 


Re: Re: Analytics e.g. "Top 10 searches"

2008-06-12 Thread Shalin Shekhar Mangar
Just as a thought, would it be possible to expose the original query text
from the QueryResultCache keys (Query) somehow? If that is possible, it
would allow us to query the top N most frequent queries anytime for
reasonable values of N.

On Fri, Jun 13, 2008 at 12:18 AM, Alexander Ramos Jardim <
[EMAIL PROTECTED]> wrote:

> Hello Jon,
>
> These are the fields in my search index:
>   --
> Where on the site this search was made
>   --
> Search
> text
>   -- Number
> of
> times this search was made
>
> How it works:
> 1. When I someone hits the search functionality I put the search made on a
> JMS to process searches statistics asynchronously.
> 2. The search information in the JMS is read in short time intervals and
> condensed. This way I get beans that contains exactly the information that
> I
> want to put on the index.
> 3. I retrieve all the X most executed searches sorted by hits and update
> their information using the one I got in (2).
> 4. I empty the index.
> 5. I update the search index using the information generated in (3).
>
> 2008/6/12 Jon Lehto <[EMAIL PROTECTED]>:
>
> > Are you doing anything 'fancy'?
> >
> > Thanks,
> > Jon
> >
> >
> > =
> > From: Alexander Ramos Jardim <[EMAIL PROTECTED]>
> > Date: 2008/06/12 Thu PM 01:23:22 EDT
> > To: solr-user@lucene.apache.org
> > Subject: Re: Analytics e.g. "Top 10 searches"
> >
> > I keep this information on a separate index that I call
> moreSearchedWords.
> > I
> > use it to generate tag clouds
> >
> > 2008/6/6 Matthew Runo <[EMAIL PROTECTED]>:
> >
> > > I'm nearly certain that everyone who maintains these stats does it
> > > themselves in their 'front end'. It's very easy to log terms and
> whatever
> > > else just before or after sending the query off to Solr.
> > >
> > > Thanks!
> > >
> > > Matthew Runo
> > > Software Developer
> > > Zappos.com
> > > 702.943.7833
> > >
> > >
> > > On Jun 6, 2008, at 3:51 AM, McBride, John wrote:
> > >
> > >
> > >> Hello,
> > >>
> > >> Is anybody familiar with any SOLR-based analytical tools which would
> > >> allow us to extract "top ten seaches", for example.
> > >>
> > >> I imagine at the query parse level, where the query is tokenized and
> > >> filtered would be the best place to log this, due to the many
> > >> permutations possible at the user input level.
> > >>
> > >> Is there an existing plugin to do this, or could you suggest how to
> > >> architect this?
> > >>
> > >> Thanks,
> > >> John
> > >>
> > >>
> > >
> >
> >
> > --
> > Alexander Ramos Jardim
> >
> >
>
>
> --
> Alexander Ramos Jardim
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Re: Analytics e.g. "Top 10 searches"

2008-06-12 Thread Yonik Seeley
On Thu, Jun 12, 2008 at 3:04 PM, Shalin Shekhar Mangar
<[EMAIL PROTECTED]> wrote:
> Just as a thought, would it be possible to expose the original query text
> from the QueryResultCache keys (Query) somehow? If that is possible, it
> would allow us to query the top N most frequent queries anytime for
> reasonable values of N.

That would only give most recent, not most frequent.

-Yonik


Re: Num docs

2008-06-12 Thread Marcus Herou
Cacti, Nagios you name it already in use :)

Well I'm the CTO so the one really really interested in estimating perf.

The id's come from a db initially and is later used for retrieval from a
distributed on disk caching system which I have written.
I'm in the process of moving from MySQL to HBase or Hypertable.

/M

On Tue, Jun 10, 2008 at 10:03 PM, Otis Gospodnetic <
[EMAIL PROTECTED]> wrote:

> Marcus,
>
> It sounds like you may just want to use a good server monitoring package
> that collects server data and prints out pretty charts.  Then you can show
> them to your IT/budget people when the charts start showing increased query
> latency times, very little available RAM, swapping, high CPU usage and such.
>  Nagios, Ganglia, any of those things will do.
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
> - Original Message 
> > From: Marcus Herou <[EMAIL PROTECTED]>
> > To: solr-user@lucene.apache.org
> > Sent: Tuesday, June 10, 2008 3:29:40 PM
> > Subject: Re: Num docs
> >
> > Well guys you are right... Still I want to have a clue about how much
> each
> > machine stores to predict when we need more machines (measure performance
> > degradation per new document). But it's harder to collect that kind of
> data.
> > It sure is doable no doubt and is a normal sharding "algo" for MySQL.
> >
> > The best approach I think is to have some bg threads run X number of
> queries
> > and collect the response times, throw away the n lowest/highest response
> > times and calc an avg time which is used for in sharding and query
> lb'ing.
> >
> > Little off topic but interesting
> > What would you guys say about a good correlation between the index size
> on
> > disk (no stored text content) and available RAM and having good response
> > times.
> >
> > How long is a rope would you perhaps say...but I think some rule of thumb
> > could be established...
> >
> > One of the schemas of concern
> >
> >
> > required="true" />
> >
> > required="true" />
> >
> > required="false" />
> >
> > stored="false" required="true" />
> >
> > required="true" />
> >
> > required="true" />
> >
> > required="false" />
> >
> > required="true" />
> >
> > required="true" />
> >
> > required="false" />
> >
> > required="false" multiValued="true"/>
> >
> > required="false" />
> >
> > required="false" />
> >
> > required="false" />
> >
> > required="false" />
> >
> >
> > and a normal solr query (taken from the log):
> > /select
> >
> start=0&q=(title:(apple)^4+OR+description:(apple))&version=2.2&rows=15&wt=xml&sort=publishDate+desc
> >
> >
> > //Marcus
> >
> >
> >
> >
> >
> > On Tue, Jun 10, 2008 at 1:15 AM, Otis Gospodnetic <
> > [EMAIL PROTECTED]> wrote:
> >
> > > Exactly.  I think I mentioned this once before several months ago.  One
> can
> > > take various hardware specs (# cores, CPU speed, FSB, RAM, etc.),
> > > performance numbers, etc. and come up with a number for each server's
> > > overall capacity.
> > >
> > >
> > > As a matter of fact, I think this would be useful to have right in
> Solr,
> > > primarily for use when allocating and sizing shards for Distributed
> Search.
> > >  JIRA enhancement/feature issue?
> > > Otis
> > > --
> > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > >
> > >
> > > - Original Message 
> > > > From: Alexander Ramos Jardim
> > > > To: solr-user@lucene.apache.org
> > > > Sent: Monday, June 9, 2008 6:42:17 PM
> > > > Subject: Re: Num docs
> > > >
> > > > I even think that such a decision should be based on the overall
> machine
> > > > performance at a given time, and not the index size. Unless you are
> > > talking
> > > > solely about HD space and not having any performance issues.
> > > >
> > > > 2008/6/7 Otis Gospodnetic :
> > > >
> > > > > Marcus,
> > > > >
> > > > >
> > > > > For that you can rely on du, vmstat, iostat, top and such, too. :)
> > > > >
> > > > > Otis
> > > > > --
> > > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > > > >
> > > > >
> > > > > - Original Message 
> > > > > > From: Marcus Herou
> > > > > > To: solr-user@lucene.apache.org
> > > > > > Sent: Saturday, June 7, 2008 12:33:10 PM
> > > > > > Subject: Re: Num docs
> > > > > >
> > > > > > Thanks, I wanna ask the indices how much more each shard can
> handle
> > > > > before
> > > > > > they're considered "full" and scream for a budget to get a new
> > > machine :)
> > > > > >
> > > > > > /M
> > > > > >
> > > > > > On Sat, Jun 7, 2008 at 3:07 PM, Otis Gospodnetic
> > > > > > wrote:
> > > > > >
> > > > > > > Marcus, check out the Luke request handler.  You can get it
> from
> > > its
> > > > > > > output.  It may also be possible to get *just* that number, but
> I'm
> > > not
> > > > > > > looking at docs/code right now to know for sure.
> > > > > > >
> > > > > > >  Otis
> > > > > > > --
> > > > > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > > > > > >
> > > > > > >
> > > > > > > - Original Message 
> > > > > > > >

Re: Re: Analytics e.g. "Top 10 searches"

2008-06-12 Thread Shalin Shekhar Mangar
Ah! I see. But can the original queries be exposed? I guess exposing this
through a SearchComponent would be appropriate. This can help in displaying
things like "What users are searching for right now?"

On Fri, Jun 13, 2008 at 12:44 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> On Thu, Jun 12, 2008 at 3:04 PM, Shalin Shekhar Mangar
> <[EMAIL PROTECTED]> wrote:
> > Just as a thought, would it be possible to expose the original query text
> > from the QueryResultCache keys (Query) somehow? If that is possible, it
> > would allow us to query the top N most frequent queries anytime for
> > reasonable values of N.
>
> That would only give most recent, not most frequent.
>
> -Yonik
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Problem with add a XML

2008-06-12 Thread Thomas Lauer

This is the error message from the console.

SCHWERWIEGEND: org.apache.lucene.store.LockObtainFailedException: Lock
obtain timed out: [EMAIL PROTECTED]:\Dokumente und E
instellungen\tla\Desktop\solr\apache-solr-1.2.0\apache-solr-1.2.0\example\solr\data\index\write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:70)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:579)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:341)
at
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:65)
at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:120)
at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:181)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:259)
at
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:166)
at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)





Jón Helgi Jónsson wrote:
> 
> Usually you get better error messages from the start.jar console, you
> don't see anything there?
> 
> On Thu, Jun 12, 2008 at 7:49 AM, Thomas Lauer <[EMAIL PROTECTED]> wrote:
>>
>> Yes my file is UTF-8. I Have Upload my file.
>>
>>
>>
>>
>> Grant Ingersoll-6 wrote:
>>>
>>>
>>> On Jun 11, 2008, at 3:46 AM, Thomas Lauer wrote:
>>>
 now I want tho add die files to solr. I have start solr on windows
 in the example directory with java -jar start.jar


 I have the following Error Message:

 C:\test\output>java -jar post.jar *.xml
 SimplePostTool: version 1.2
 SimplePostTool: WARNING: Make sure your XML documents are encoded in
 UTF-8, other encodings are not currently supported
>>>
>>>
>>> This is your issue right here.  You have to save that second file in
>>> UTF-8.
>>>

 SimplePostTool: POSTing files to http://localhost:8983/solr/update..
 SimplePostTool: POSTing file 1.xml
 SimplePostTool: POSTing file 2.xml
 SimplePostTool: FATAL: Connection error (is Solr running at
 http://localhost:8983/solr/update
  ?): java.io.IOException: S
 erver returned HTTP response code: 400 for URL:
 http://localhost:8983/solr/update

 C:\test\output>

 Regards Thomas Lauer





 __ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank-
 Version 3175 (20080611) __

 E-Mail wurde geprüft mit ESET NOD32 Antivirus.

 http://www.eset.com
>>>
>>> --
>>> Grant Ingersoll
>>> http://www.lucidimagination.com
>>>
>>> Lucene Helpful Hints:
>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>> http://www.nabble.com/file/p17794387/2.xml 2.xml
>> --
>> View this message in context:
>> http://www.nabble.com/Problem-with-add-a-XML-tp17772018p17794387.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Problem-with-add-a-XML-tp17772018p17808276.html
Sent from the Solr - User mailing list

Re: Problem with add a XML

2008-06-12 Thread Yonik Seeley
That can happen if the JVM died or got a critical error.
You can remove the lock file manually or configure Solr to remove it
manually (see solrconfig.xml)

-Yonik

On Thu, Jun 12, 2008 at 3:57 PM, Thomas Lauer <[EMAIL PROTECTED]> wrote:
>
> This is the error message from the console.
>
> SCHWERWIEGEND: org.apache.lucene.store.LockObtainFailedException: Lock
> obtain timed out: [EMAIL PROTECTED]:\Dokumente und E
> instellungen\tla\Desktop\solr\apache-solr-1.2.0\apache-solr-1.2.0\example\solr\data\index\write.lock
>at org.apache.lucene.store.Lock.obtain(Lock.java:70)
>at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:579)
>at org.apache.lucene.index.IndexWriter.(IndexWriter.java:341)
>at
> org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:65)
>at
> org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:120)
>at
> org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:181)
>at
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:259)
>at
> org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:166)
>at
> org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84)
>at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
>at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159)
>at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
>at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
>at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
>at
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
>at org.mortbay.jetty.Server.handle(Server.java:285)
>at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
>at
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
>at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
>at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
>at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
>at
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
>at
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
>
>
>
>
>
> Jón Helgi Jónsson wrote:
>>
>> Usually you get better error messages from the start.jar console, you
>> don't see anything there?
>>
>> On Thu, Jun 12, 2008 at 7:49 AM, Thomas Lauer <[EMAIL PROTECTED]> wrote:
>>>
>>> Yes my file is UTF-8. I Have Upload my file.
>>>
>>>
>>>
>>>
>>> Grant Ingersoll-6 wrote:


 On Jun 11, 2008, at 3:46 AM, Thomas Lauer wrote:

> now I want tho add die files to solr. I have start solr on windows
> in the example directory with java -jar start.jar
>
>
> I have the following Error Message:
>
> C:\test\output>java -jar post.jar *.xml
> SimplePostTool: version 1.2
> SimplePostTool: WARNING: Make sure your XML documents are encoded in
> UTF-8, other encodings are not currently supported


 This is your issue right here.  You have to save that second file in
 UTF-8.

>
> SimplePostTool: POSTing files to http://localhost:8983/solr/update..
> SimplePostTool: POSTing file 1.xml
> SimplePostTool: POSTing file 2.xml
> SimplePostTool: FATAL: Connection error (is Solr running at
> http://localhost:8983/solr/update
>  ?): java.io.IOException: S
> erver returned HTTP response code: 400 for URL:
> http://localhost:8983/solr/update
>
> C:\test\output>
>
> Regards Thomas Lauer
>
>
>
>
>
> __ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank-
> Version 3175 (20080611) __
>
> E-Mail wurde geprüft mit ESET NOD32 Antivirus.
>
> http://www.eset.com

 --
 Grant Ingersoll
 http://www.lucidimagination.com

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ






>

Re: Strategy for presenting fresh data

2008-06-12 Thread Norberto Meijome
On Thu, 12 Jun 2008 07:14:04 -0700 (PDT)
Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

> What you are describing is pretty much what the original poster intends to 
> do, as far as I understand.


ah right, i am reading it again in the morning and it makes sense . thanks for 
shaking the cobwebs off my mind :P

B

_
{Beto|Norberto|Numard} Meijome

Lack of planning on your part does not constitute an emergency on ours.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


RE: Strategy for presenting fresh data

2008-06-12 Thread Norskog, Lance
You can also use a shared file system mounted on a common SAN. 
(This is a high-end server configuration.) 

-Original Message-
From: James Brady [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 12, 2008 9:59 AM
To: solr-user@lucene.apache.org
Subject: Re: Strategy for presenting fresh data

>>
>> In the meantime, I had imagined that, although clumsy,  federated 
>> search could be used for this purpose - posting the new documents to 
>> a group of servers ('latest updates servers') with v limited amount 
>> of documents with v. fast "reload / refresh" times, and sending them 
>> again (on a work queue, possibly), to the 'core servers'. Regularly 
>> cleaning the 'latest updates servers'
>> of the
>> already posted documents to 'core servers' would keep them lean...   
>> of course,
>> this approach sucks compared to a proper solution like what James is 
>> suggesting
>> :)
>>


Otis - is there an issue I should be looking at for more information on
this?

Yes, in principle, sending updates both to a fresh, forgetful and fast
index and a larger, slower index is what I'm thinking of doing.

The only difference is that I'm talking about having the fresh index be
implemented as a RAMDirectory in the same JVM as the large index.

This means that I can avoid the slowness of cross-disk or cross- machine
replication, I can avoid having to index all documents in two places and
I cut out the extra moving part of federated search.

On the other hand, I am going to have to write my own piece to handle
the index flushes and federate searches to the fast and large indices.

Thanks for your input!
James


Best type to use for enum-like behavior

2008-06-12 Thread Jon Drukman
I am going to store two totally different types of documents in a single 
solr instance.  Eventually I may separate them into separate instances 
but we are a long way from having either the size or traffic to require 
that.


I read somewhere that a good approach is to add a 'type' field to the 
data and then use a filter query.  What data type would you use for the 
type field?  I could just an integer but then we have to remember that 
1=user, 2=item, and so on.  In mysql there's an enum type where you use 
text labels that are mapped to integers behind the scenes (good 
performance and user friendly).  Is there something similar in solr or 
should I just use a string?


-jsd-



Re: Best type to use for enum-like behavior

2008-06-12 Thread Erik Hatcher
Just use a string.  Any ol' string that suits your domain will do.   
Just be sure the field type is untokenized (the "string" type in the  
example configuration will do).


Erik

On Jun 12, 2008, at 8:07 PM, Jon Drukman wrote:

I am going to store two totally different types of documents in a  
single solr instance.  Eventually I may separate them into separate  
instances but we are a long way from having either the size or  
traffic to require that.


I read somewhere that a good approach is to add a 'type' field to  
the data and then use a filter query.  What data type would you use  
for the type field?  I could just an integer but then we have to  
remember that 1=user, 2=item, and so on.  In mysql there's an enum  
type where you use text labels that are mapped to integers behind  
the scenes (good performance and user friendly).  Is there something  
similar in solr or should I just use a string?


-jsd-




Re: Num docs

2008-06-12 Thread Otis Gospodnetic
Or, if you want to go with something older/more stable, go with BDB. :)


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: Marcus Herou <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, June 12, 2008 3:17:52 PM
> Subject: Re: Num docs
> 
> Cacti, Nagios you name it already in use :)
> 
> Well I'm the CTO so the one really really interested in estimating perf.
> 
> The id's come from a db initially and is later used for retrieval from a
> distributed on disk caching system which I have written.
> I'm in the process of moving from MySQL to HBase or Hypertable.
> 
> /M
> 
> On Tue, Jun 10, 2008 at 10:03 PM, Otis Gospodnetic <
> [EMAIL PROTECTED]> wrote:
> 
> > Marcus,
> >
> > It sounds like you may just want to use a good server monitoring package
> > that collects server data and prints out pretty charts.  Then you can show
> > them to your IT/budget people when the charts start showing increased query
> > latency times, very little available RAM, swapping, high CPU usage and such.
> >  Nagios, Ganglia, any of those things will do.
> >
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> > - Original Message 
> > > From: Marcus Herou 
> > > To: solr-user@lucene.apache.org
> > > Sent: Tuesday, June 10, 2008 3:29:40 PM
> > > Subject: Re: Num docs
> > >
> > > Well guys you are right... Still I want to have a clue about how much
> > each
> > > machine stores to predict when we need more machines (measure performance
> > > degradation per new document). But it's harder to collect that kind of
> > data.
> > > It sure is doable no doubt and is a normal sharding "algo" for MySQL.
> > >
> > > The best approach I think is to have some bg threads run X number of
> > queries
> > > and collect the response times, throw away the n lowest/highest response
> > > times and calc an avg time which is used for in sharding and query
> > lb'ing.
> > >
> > > Little off topic but interesting
> > > What would you guys say about a good correlation between the index size
> > on
> > > disk (no stored text content) and available RAM and having good response
> > > times.
> > >
> > > How long is a rope would you perhaps say...but I think some rule of thumb
> > > could be established...
> > >
> > > One of the schemas of concern
> > >
> > >
> > > required="true" />
> > >
> > > required="true" />
> > >
> > > required="false" />
> > >
> > > stored="false" required="true" />
> > >
> > > required="true" />
> > >
> > > required="true" />
> > >
> > > required="false" />
> > >
> > > required="true" />
> > >
> > > required="true" />
> > >
> > > required="false" />
> > >
> > > required="false" multiValued="true"/>
> > >
> > > required="false" />
> > >
> > > required="false" />
> > >
> > > required="false" />
> > >
> > > required="false" />
> > >
> > >
> > > and a normal solr query (taken from the log):
> > > /select
> > >
> > 
> start=0&q=(title:(apple)^4+OR+description:(apple))&version=2.2&rows=15&wt=xml&sort=publishDate+desc
> > >
> > >
> > > //Marcus
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Jun 10, 2008 at 1:15 AM, Otis Gospodnetic <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > > > Exactly.  I think I mentioned this once before several months ago.  One
> > can
> > > > take various hardware specs (# cores, CPU speed, FSB, RAM, etc.),
> > > > performance numbers, etc. and come up with a number for each server's
> > > > overall capacity.
> > > >
> > > >
> > > > As a matter of fact, I think this would be useful to have right in
> > Solr,
> > > > primarily for use when allocating and sizing shards for Distributed
> > Search.
> > > >  JIRA enhancement/feature issue?
> > > > Otis
> > > > --
> > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > > >
> > > >
> > > > - Original Message 
> > > > > From: Alexander Ramos Jardim
> > > > > To: solr-user@lucene.apache.org
> > > > > Sent: Monday, June 9, 2008 6:42:17 PM
> > > > > Subject: Re: Num docs
> > > > >
> > > > > I even think that such a decision should be based on the overall
> > machine
> > > > > performance at a given time, and not the index size. Unless you are
> > > > talking
> > > > > solely about HD space and not having any performance issues.
> > > > >
> > > > > 2008/6/7 Otis Gospodnetic :
> > > > >
> > > > > > Marcus,
> > > > > >
> > > > > >
> > > > > > For that you can rely on du, vmstat, iostat, top and such, too. :)
> > > > > >
> > > > > > Otis
> > > > > > --
> > > > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > > > > >
> > > > > >
> > > > > > - Original Message 
> > > > > > > From: Marcus Herou
> > > > > > > To: solr-user@lucene.apache.org
> > > > > > > Sent: Saturday, June 7, 2008 12:33:10 PM
> > > > > > > Subject: Re: Num docs
> > > > > > >
> > > > > > > Thanks, I wanna ask the indices how much more each shard can
> > handle
> > > > > > before
> > > > > > > they're considered "full" and scream for a budget to get a

Re: Strategy for presenting fresh data

2008-06-12 Thread Otis Gospodnetic
Hi James,

Right, you'll have to write some custom components.  It may be wiser to spend 
your time looking at what Jason R (sorry, can't remember the last name of 
the top of my head) put in JIRA. (you'll have to search, don't recall issue 
IDs).

Actually, having a full Solr email folder helps sometime - see SOLR-564.

Otis 
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: James Brady <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, June 12, 2008 12:58:44 PM
> Subject: Re: Strategy for presenting fresh data
> 
> >>
> >> In the meantime, I had imagined that, although clumsy,  federated  
> >> search could
> >> be used for this purpose - posting the new documents to a group of  
> >> servers
> >> ('latest updates servers') with v limited amount of documents with  
> >> v. fast
> >> "reload / refresh" times, and sending them again (on a work queue,  
> >> possibly), to
> >> the 'core servers'. Regularly cleaning the 'latest updates servers'  
> >> of the
> >> already posted documents to 'core servers' would keep them lean...  
> >> of course,
> >> this approach sucks compared to a proper solution like what James  
> >> is suggesting
> >> :)
> >>
> 
> 
> Otis - is there an issue I should be looking at for more information  
> on this?
> 
> Yes, in principle, sending updates both to a fresh, forgetful and fast  
> index and a larger, slower index is what I'm thinking of doing.
> 
> The only difference is that I'm talking about having the fresh index  
> be implemented as a RAMDirectory in the same JVM as the large index.
> 
> This means that I can avoid the slowness of cross-disk or cross- 
> machine replication, I can avoid having to index all documents in two  
> places and I cut out the extra moving part of federated search.
> 
> On the other hand, I am going to have to write my own piece to handle  
> the index flushes and federate searches to the fast and large indices.
> 
> Thanks for your input!
> James



My First Solr

2008-06-12 Thread Thomas Lauer
HI,

i have installed my first solr on tomcat. I have modify my shema.xml
for my XML´s and I have import with the post.jar some xml files.

tomcat runs
solr/admin runs

post.jar imports files


but I can´t find my files.

the reponse ist always

  
  
   
0
0

 10
 0
 on
 KIS
 2.2

   
  
  

My files in the attachment

Regards Thomas








  

  


































  

  




  







  
  







  





  







  




  








  


 
 

 


 
   
   














   

   

   
   


   
   

   
   
   

   
   
   
   
   
   
   
   
   


   
   
 

 
 guid

 
 beschreibung

 
 

  


 
 




  
00098d72-c03a-4075-b8af-80bd7d6fd7c5
143882
28.10.2005 13:05:52
Rechnung 2005-025232
Rechnungsduplikate
2002
330Q.doc
KIS
Bonow
29536
Guardus Solutions AG
Mandant
  


Re: My First Solr

2008-06-12 Thread Brian Carmalt
Hello Thomas, 

Have you performed a commit? Try adding  as the last line of
the document you are adding. I would suggest you read up on commits and
how often you should perform them and how to do auto commits. 

Brian

Am Freitag, den 13.06.2008, 07:20 +0200 schrieb Thomas Lauer:
> HI,
> 
> i have installed my first solr on tomcat. I have modify my shema.xml
> for my XML´s and I have import with the post.jar some xml files.
> 
> tomcat runs
> solr/admin runs
> 
> post.jar imports files
> 
> 
> but I can´t find my files.
> 
> the reponse ist always
> 
>   
>   
>
> 0
> 0
> 
>  10
>  0
>  on
>  KIS
>  2.2
> 
>
>   
>   
> 
> My files in the attachment
> 
> Regards Thomas
> 
> 



AW: My First Solr

2008-06-12 Thread Thomas Lauer
Hi Brian,

i have tested:
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, 
other encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file import_sample.xml
SimplePostTool: COMMITting Solr index changes..

but can´t find the document.

Regards
Thomas

-Ursprüngliche Nachricht-
Von: Brian Carmalt [mailto:[EMAIL PROTECTED]
Gesendet: Freitag, 13. Juni 2008 07:36
An: solr-user@lucene.apache.org
Betreff: Re: My First Solr

Hello Thomas,

Have you performed a commit? Try adding  as the last line of
the document you are adding. I would suggest you read up on commits and
how often you should perform them and how to do auto commits.

Brian

Am Freitag, den 13.06.2008, 07:20 +0200 schrieb Thomas Lauer:
> HI,
>
> i have installed my first solr on tomcat. I have modify my shema.xml
> for my XML´s and I have import with the post.jar some xml files.
>
> tomcat runs
> solr/admin runs
>
> post.jar imports files
>
>
> but I can´t find my files.
>
> the reponse ist always
>
>   
>   
>
> 0
> 0
> 
>  10
>  0
>  on
>  KIS
>  2.2
> 
>
>   
>   
>
> My files in the attachment
>
> Regards Thomas
>
>



Re: My First Solr

2008-06-12 Thread Geert Van Huychem

Hi,

the string you're looking for is in the "anwendung" field, while your 
default searchfield is the "beschreibung" field, try specifying the 
searchfield like this anwendung:"KIS".


Geert

Thomas Lauer wrote:

HI,

i have installed my first solr on tomcat. I have modify my shema.xml
for my XML´s and I have import with the post.jar some xml files.

tomcat runs
solr/admin runs

post.jar imports files


but I can´t find my files.

the reponse ist always

  
  
   
0
0

 10
 0
 on
 KIS
 2.2

   
  
  

My files in the attachment

Regards Thomas


  




Re: AW: My First Solr

2008-06-12 Thread Brian Carmalt
Do you see if the document update is sucessful? When you start solr with
java -jar start.jar for the example, Solr will list the the document id
of the docs that you are adding and tell you how long the update took. 

A simple  but brute force method to findout if a document has been
commited is to stop the server and then restart it.

You can also use the solr/admin/stats.jsp page to see if the docs are
there. 

After looking at your query in the results you posted, I would bet that
you are not specifying a search field. try searching for "anwendung:KIS"
or "id:[1 TO *]" to see all the docs in you index. 

Brian

Am Freitag, den 13.06.2008, 07:40 +0200 schrieb Thomas Lauer:
> i have tested:
> SimplePostTool: version 1.2
> SimplePostTool: WARNING: Make sure your XML documents are encoded in
> UTF-8, other encodings are not currently supported
> SimplePostTool: POSTing files to http://localhost:8983/solr/update..
> SimplePostTool: POSTing file import_sample.xml
> SimplePostTool: COMMITting Solr index changes.. 



AW: AW: My First Solr

2008-06-12 Thread Thomas Lauer
ok, i find my files now. can I make all files to the default search file?

Regards Thomas

-Ursprüngliche Nachricht-
Von: Brian Carmalt [mailto:[EMAIL PROTECTED]
Gesendet: Freitag, 13. Juni 2008 08:03
An: solr-user@lucene.apache.org
Betreff: Re: AW: My First Solr

Do you see if the document update is sucessful? When you start solr with
java -jar start.jar for the example, Solr will list the the document id
of the docs that you are adding and tell you how long the update took.

A simple  but brute force method to findout if a document has been
commited is to stop the server and then restart it.

You can also use the solr/admin/stats.jsp page to see if the docs are
there.

After looking at your query in the results you posted, I would bet that
you are not specifying a search field. try searching for "anwendung:KIS"
or "id:[1 TO *]" to see all the docs in you index.

Brian

Am Freitag, den 13.06.2008, 07:40 +0200 schrieb Thomas Lauer:
> i have tested:
> SimplePostTool: version 1.2
> SimplePostTool: WARNING: Make sure your XML documents are encoded in
> UTF-8, other encodings are not currently supported
> SimplePostTool: POSTing files to http://localhost:8983/solr/update..
> SimplePostTool: POSTing file import_sample.xml
> SimplePostTool: COMMITting Solr index changes..



__ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank-Version 3182 
(20080612) __

E-Mail wurde geprüft mit ESET NOD32 Antivirus.

http://www.eset.com



Re: AW: AW: My First Solr

2008-06-12 Thread Brian Carmalt
The DisMaxQueryHandler is your friend. 

Am Freitag, den 13.06.2008, 08:29 +0200 schrieb Thomas Lauer:
> ok, i find my files now. can I make all files to the default search file?
> 
> Regards Thomas
> 
> -Ursprüngliche Nachricht-
> Von: Brian Carmalt [mailto:[EMAIL PROTECTED]
> Gesendet: Freitag, 13. Juni 2008 08:03
> An: solr-user@lucene.apache.org
> Betreff: Re: AW: My First Solr
> 
> Do you see if the document update is sucessful? When you start solr with
> java -jar start.jar for the example, Solr will list the the document id
> of the docs that you are adding and tell you how long the update took.
> 
> A simple  but brute force method to findout if a document has been
> commited is to stop the server and then restart it.
> 
> You can also use the solr/admin/stats.jsp page to see if the docs are
> there.
> 
> After looking at your query in the results you posted, I would bet that
> you are not specifying a search field. try searching for "anwendung:KIS"
> or "id:[1 TO *]" to see all the docs in you index.
> 
> Brian
> 
> Am Freitag, den 13.06.2008, 07:40 +0200 schrieb Thomas Lauer:
> > i have tested:
> > SimplePostTool: version 1.2
> > SimplePostTool: WARNING: Make sure your XML documents are encoded in
> > UTF-8, other encodings are not currently supported
> > SimplePostTool: POSTing files to http://localhost:8983/solr/update..
> > SimplePostTool: POSTing file import_sample.xml
> > SimplePostTool: COMMITting Solr index changes..
> 
> 
> 
> __ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank-Version 3182 
> (20080612) __
> 
> E-Mail wurde geprüft mit ESET NOD32 Antivirus.
> 
> http://www.eset.com
>