questions on query format

2011-10-23 Thread Memory Makers
Hi,

I've spent quite some time reading up on the query format and can't seem to
solve this problem:

1. If send solr the following query:
  q={!lucene}profile_description:*

  I get what I would expect.

2. If send solr the following query:
  q=*:*

  I get nothing just:
   

Would appreciate some insight into what is going on.

Thanks.


Re: questions on query format

2011-10-24 Thread Memory Makers
Thanks,

?q.alt=*:* worked for me -- how do I make sure that the standard query
parser is configured.

Thanks.

MM.


On Mon, Oct 24, 2011 at 2:47 AM, Ahmet Arslan  wrote:

> > 2. If send solr the following query:
> >   q=*:*
> >
> >   I get nothing just:
> > > name="response" numFound="0" start="0"
> > maxScore="0.0"/> > name="highlighting"/>
> >
> > Would appreciate some insight into what is going on.
>
> If you are using dismax as query parser, then *:* won't function as match
> all docs query. To retrieve all docs - with dismax - use q.alt=*:*
> parameter. Also, adding debugQuery=on will display information about parsed
> query.
>


Is there a good web front end application / interface for solr

2011-10-24 Thread Memory Makers
Greetings guys,

Is there a good front end application / interface for solr?

Features I'm looking for are:
  configure query interface (using non programatic features)
  configure pagination
  configure bookmarking of results
  export results of a query to a csv or other format (JSON, etc.)

  Is there any demand for such an application?

Thanks.


Re: Is there a good web front end application / interface for solr

2011-10-25 Thread Memory Makers
Looks very interesting -- actually I looked at it a while back but in a
different context -- for a non RoR person how much of a learning curve is it
to set up?

Thanks.

On Tue, Oct 25, 2011 at 5:49 AM, Erik Hatcher wrote:

> Blacklight - http://projectblacklight.org/
>
> It's a full featured application fronting Solr.  It's Ruby on Rails based,
> and powers many library front-ends but is becoming much more general purpose
> for other domains.  See examples here:
> https://github.com/projectblacklight/blacklight/wiki/Examples
>
> Also, the forensics domain has used it as well, as mentioned in the slides
> and talk I attended at Lucene Revolution earlier this year: <
> http://www.lucidimagination.com/blog/2011/06/01/solr-and-law-enforcement-highly-relevant-results-can-be-a-crime/
> >
>
> Often the decision for an application layer like this is determined by the
> programming language and frameworks used.  Blacklight is "opinionated" (as
> any other concrete implementation would be) in this regard.  If it fits your
> tastes, it's a great technology to use.
>
>Erik
>
>
> On Oct 24, 2011, at 15:56 , Memory Makers wrote:
>
> > Greetings guys,
> >
> > Is there a good front end application / interface for solr?
> >
> > Features I'm looking for are:
> >  configure query interface (using non programatic features)
> >  configure pagination
> >  configure bookmarking of results
> >  export results of a query to a csv or other format (JSON, etc.)
> >
> >  Is there any demand for such an application?
> >
> > Thanks.
>
>


Re: Is there a good web front end application / interface for solr

2011-10-25 Thread Memory Makers
Kool -- I was hoping to avoid adding another language :-( python/java/php
were going to be it for me -- but I guess not.

Thanks.

On Tue, Oct 25, 2011 at 6:02 AM, Erik Hatcher wrote:

> You could be up and running with Blacklight by following the quickstart
> instructions in only a few minutes, but Ruby and RoR know-how will be needed
> to go further with the types of customizations you mentioned.  Some things
> will be purely in configuration sections (but still within Ruby code files)
> and done easily, but some other customizations will require deeper
> knowledge.
>
> With only a few minutes (given the prerequisites already installed) to give
> it a try, might as well give it a go :)  The Blacklight community is very
> helpful too, so ask on their e-mail list for assistance, or tap into the
> #blacklight IRC channel.
>
>Erik
>
>
> On Oct 25, 2011, at 05:53 , Memory Makers wrote:
>
> > Looks very interesting -- actually I looked at it a while back but in a
> > different context -- for a non RoR person how much of a learning curve is
> it
> > to set up?
> >
> > Thanks.
> >
> > On Tue, Oct 25, 2011 at 5:49 AM, Erik Hatcher  >wrote:
> >
> >> Blacklight - http://projectblacklight.org/
> >>
> >> It's a full featured application fronting Solr.  It's Ruby on Rails
> based,
> >> and powers many library front-ends but is becoming much more general
> purpose
> >> for other domains.  See examples here:
> >> https://github.com/projectblacklight/blacklight/wiki/Examples
> >>
> >> Also, the forensics domain has used it as well, as mentioned in the
> slides
> >> and talk I attended at Lucene Revolution earlier this year: <
> >>
> http://www.lucidimagination.com/blog/2011/06/01/solr-and-law-enforcement-highly-relevant-results-can-be-a-crime/
> >>>
> >>
> >> Often the decision for an application layer like this is determined by
> the
> >> programming language and frameworks used.  Blacklight is "opinionated"
> (as
> >> any other concrete implementation would be) in this regard.  If it fits
> your
> >> tastes, it's a great technology to use.
> >>
> >>   Erik
> >>
> >>
> >> On Oct 24, 2011, at 15:56 , Memory Makers wrote:
> >>
> >>> Greetings guys,
> >>>
> >>> Is there a good front end application / interface for solr?
> >>>
> >>> Features I'm looking for are:
> >>> configure query interface (using non programatic features)
> >>> configure pagination
> >>> configure bookmarking of results
> >>> export results of a query to a csv or other format (JSON, etc.)
> >>>
> >>> Is there any demand for such an application?
> >>>
> >>> Thanks.
> >>
> >>
>
>


Re: Is there a good web front end application / interface for solr

2011-10-25 Thread Memory Makers
Well https://github.com/evolvingweb/ajax-solr is fairly decent for that --
haven't used it in a while but that is a minimalist client -- however I find
it hard to customize.

MM.

On Tue, Oct 25, 2011 at 8:34 AM, Fred Zimmerman wrote:

> what about something that's a bit less discovery-oriented? for my
> particular
> application I am most concerned with bringing back a straightforward "top
> ten" answer set and having users look at it. I actually don't want to
> bother
> them with faceting, etc. at this juncture.
>
> Fred
>
> On Tue, Oct 25, 2011 at 7:40 AM, Erik Hatcher  >wrote:
>
> >
> > On Oct 25, 2011, at 07:24 , Robert Stewart wrote:
> >
> > > It is really not very difficult to build a decent web front-end to SOLR
> > using one of the available client libraries
> >
> > Or even just not using any client library at all (other than an HTTP
> > library).  I've done a bit of proof-of-concept/prototyping with a super
> > light weight (and of course Ruby!) approach with my Prism tinkering: <
> > https://github.com/lucidimagination/Prism>
> >
> > Yes, in general it's very straightforward to build a search UI that shows
> > results, pages through them, displays facets, and allows them to be
> clicked
> > and filter results and so on.  Devil is always in the details, and having
> > saved searches, export, customizability, authentication, and so on makes
> it
> > a more involved proposition.
> >
> > If you're in a PHP environment, there is VUFind... again pretty
> > library-centric at first, but likely flexible enough to handle any Solr
> > setup - .  For the Pythonistas, there's Kochief -
> > http://code.google.com/p/kochief/
> >
> > Being a Rubyist myself (and founder of Blacklight), I'm not intimately
> > familiar with the other solutions but the library world has done a lot to
> > get this sort of thing off the ground in many environments.
> >
> >Erik
> >
> >
>


Points to processing hastags

2011-10-25 Thread Memory Makers
Greetings,

I am trying to index hashtags from twitter -- so they are tokens that start
with a # symbol and can have any number of alpha numeric characters.

Examples:
1. #jane
2. #Jane
3. #Jane!

At a high level I'd like to be able to:
1. differentiate between say #jane and #jane!
2. differentiate between a hashtag such as #jane and a regular text token
jane
3. ask for variation on #jane -- by this I mean #jane? #jane!!! #jane!?!??
are all variations of jane

I'd appreciate points to what my considerations should be when I attempt to
do the above.

Thanks,

MM.


Re: Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread Memory Makers
Eric,

NGrams could you elaborate on that ? -- haven't seen that before.

Thanks.

On Tue, Nov 1, 2011 at 11:06 AM, Erick Erickson wrote:

> NGrams are often used in Solr for this case, but they will also add to
> your index size.
>
> It might be worthwhile to look closely at your user requirements
> before going ahead
> and supporting this functionality
>
> Best
> Erick
>
> 2011/11/1 François Schiettecatte :
> > Kuli
> >
> > Good point about just tokenizing the fields :)
> >
> > I ran a couple of tests to double-check my understanding and you can
> have a wildcard operator at either or both ends of a term. Adding
> ReversedWildcardFilterFactory to your field analyzer will make leading
> wildcard searches a lot faster of course but at the expense of index size.
> >
> > Cheers
> >
> > François
> >
> >
> > On Nov 1, 2011, at 9:07 AM, Michael Kuhlmann wrote:
> >
> >> Hi,
> >>
> >> this is not exactly true. In Solr, you can't have the wildcard operator
> on both sides of the operator.
> >>
> >> However, you can tokenize your fields and simply query for "Solr". This
> is what's Solr made for. :)
> >>
> >> -Kuli
> >>
> >> Am 01.11.2011 13:24, schrieb François Schiettecatte:
> >>> Arshad
> >>>
> >>> Actually it is available, you need to use the
> ReversedWildcardFilterFactory which I am sure you can Google for.
> >>>
> >>> Solr and SQL address different problem sets with some overlaps but
> there are significant differences between the two technologies. Actually
> '%Solr%' is a worse case for SQL but handled quite elegantly in Solr.
> >>>
> >>> Hope this helps!
> >>>
> >>> Cheers
> >>>
> >>> François
> >>>
> >>>
> >>> On Nov 1, 2011, at 7:46 AM, arshad ansari wrote:
> >>>
>  Hi,
> 
>  Is SQL Like operator feature available in Apache Solr Just like we
> have it
>  in SQL.
> 
>  SQL example below -
> 
>  *Select * from Employee where employee_name like '%Solr%'*
> 
>  If not is it a Bug with Solr. If this feature available, please tell
> the
>  examples available.
> 
>  Thanks!
> 
>  --
>  Best Regards,
>  Arshad
> >>>
> >>
> >
> >
>


simple persistance layer on top of Solr

2011-11-01 Thread Memory Makers
Greetings guys,

I have been thinking of using Solr as a simple database due to it's
blinding speed -- actually I've used that approach in some projects with
decent success.

Any thoughts on that?

Thanks,

MM.


Re: simple persistance layer on top of Solr

2011-11-01 Thread Memory Makers
Well I want something beyond a key value store.

  I want to be able to free-text search documents
  I want to be able to retrieve documents based on other criteria

  I'm not sure how that would compare with something like MongoDB.

Thanks.

On Tue, Nov 1, 2011 at 11:49 AM, Walter Underwood wrote:

> Other than "it isn't a database"?
>
> If you want a key/value store, use one of those. If you want a full DB
> with transactions, use one of those.
>
> wunder
>
> On Nov 1, 2011, at 8:47 AM, Memory Makers wrote:
>
> > Greetings guys,
> >
> > I have been thinking of using Solr as a simple database due to it's
> > blinding speed -- actually I've used that approach in some projects with
> > decent success.
> >
> > Any thoughts on that?
> >
> > Thanks,
> >
> > MM.
>
>
>
>
>
>


Re: simple persistance layer on top of Solr

2011-11-01 Thread Memory Makers
Well,

I've done a lot of work with MySQL and content management systems -- and
frankly whenever I have to integrate with Solr or do some Lucene work I am
amazed at the speed -- even when I index web pages for search -- MySQL
pales by comparison when data sets get large (2> million rows)

Thanks,

MM.

On Tue, Nov 1, 2011 at 12:01 PM, Robert Stewart wrote:

> One other potentially huge consideration is how "updatable" you need
> documents to be.  Lucene only can replace existing documents, it cannot
> modify existing documents directly (so an update is essentially a delete
> followed by an insert of a new document with the same primary key).  There
> are performance considerations here as well (how to do bulk updates
> quickly, etc.).
>
> Bob
>
>
> On Nov 1, 2011, at 11:47 AM, Memory Makers wrote:
>
> > Greetings guys,
> >
> > I have been thinking of using Solr as a simple database due to it's
> > blinding speed -- actually I've used that approach in some projects with
> > decent success.
> >
> > Any thoughts on that?
> >
> > Thanks,
> >
> > MM.
>
>


performance - dynamic fields versus static fields

2011-11-03 Thread Memory Makers
Hi,

Is there a handy resource on the:
  a. performance of: dynamic fields versus static fields
  b. other pros-cons?

Thanks.


Re: Can Apache Solr Handle TeraByte Large Data

2012-01-16 Thread Memory Makers
I've been toying with the idea of setting up an experiment to index a large
document set 1+ TB -- any thoughts on an open data set that one could use
for this purpose?

Thanks.

On Mon, Jan 16, 2012 at 5:00 PM, Burton-West, Tom wrote:

> Hello ,
>
> Searching real-time sounds difficult with that amount of data. With large
> documents, 3 million documents, and 5TB of data the index will be very
> large. With indexes that large your performance will probably be I/O bound.
>
> Do you plan on allowing phrase or proximity searches? If so, your
> performance will be even more I/O bound as documents that large will have
> huge positions indexes that will need to be read into memory for processing
> phrase queries. To reduce I/O you need as much of the index in memory
> (Lucene/Solr caches, and operating system disk cache).  Every commit
> invalidates the Solr/Lucene caches (unless the newer nrt code has solved
> this for Solr).
>
> If you index and serve on the same server, you are also going to get
> terrible response time whenever your commits trigger a large merge.
>
> If you need to service 10-100 qps or more, you may need to look at putting
> your index on SSDs or spreading it over enough machines so it can stay in
> memory.
>
> What kind of response times are you looking for and what query rate?
>
> We have somewhat smaller documents. We have 10 million documents and about
> 6-8TB of data in HathiTrust and have spread the index over 12 shards on 4
> machines (i.e. 3 shards per machine).   We get an average of around
> 200-300ms response time but our 95th percentile times are about 800ms and
> 99th percentile are around 2 seconds.  This is with an average load of less
> than 1 query/second.
>
> As Otis suggested, you may want to implement a strategy that allows users
> to search within the large documents by breaking the documents up into
> smaller units. What we do is have two Solr indexes.  The first indexes
> complete documents.  When the user clicks on a result, we index the entire
> document on a page level in a small Solr index on-the-fly.  That way they
> can search within the document and get page level results.
>
> More details about our setup:
> http://www.hathitrust.org/blogs/large-scale-search
>
> Tom Burton-West
> University of Michigan Library
> www.hathitrust.org
> -Original Message-
>
>


Re: Speeding up indexing

2012-02-27 Thread Memory Makers
Many thanks for the response.

Here is the revised questions:

For example if I have N processes that are producing documents to index:
1. Should I have them simultaneously submit documents to Solr (will this
improve the indexing throughput)?
2. Is there anything I can do Solr configuration wise that will allow me to
speed up indexing
3. Is there an architecture where I can have two (or more) solr server do
indexing in parallel

Thanks.

On Mon, Feb 27, 2012 at 1:46 PM, Erik Hatcher wrote:

> Yes, absolutely.  Parallelizing indexing can make a huge difference.  How
> you do so will depend on your indexing environment.  Most crudely, running
> multiple indexing scripts on different subsets of data up to the the
> limitations of your operating system and hardware is how many do it.
> SolrJ has some multithreaded facility, as does DataImportHandler.
>  Distributing the indexing to multiple machines, but pointing all to the
> same Solr server, is effectively the same as multi-threading it push
> documents into Solr from wherever as fast as it can handle it.  This is
> definitely how many do this.
>
>Erik
>
> On Feb 27, 2012, at 13:24 , Memory Makers wrote:
>
> > Hi,
> >
> > Is there a way to speed up indexing by increasing the number of threads
> > doing the indexing or perhaps by distributing indexing on multiple
> machines?
> >
> > Thanks.
>
>


Re: Speeding up indexing

2012-02-27 Thread Memory Makers
A quick add on to this -- we have over 30 million documents.

I take it that we should be looking @ Distributed Solr?
  as in
http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e344

Thanks.

On Mon, Feb 27, 2012 at 2:33 PM, Memory Makers wrote:

> Many thanks for the response.
>
> Here is the revised questions:
>
> For example if I have N processes that are producing documents to index:
> 1. Should I have them simultaneously submit documents to Solr (will this
> improve the indexing throughput)?
> 2. Is there anything I can do Solr configuration wise that will allow me
> to speed up indexing
> 3. Is there an architecture where I can have two (or more) solr server do
> indexing in parallel
>
> Thanks.
>
> On Mon, Feb 27, 2012 at 1:46 PM, Erik Hatcher wrote:
>
>> Yes, absolutely.  Parallelizing indexing can make a huge difference.  How
>> you do so will depend on your indexing environment.  Most crudely, running
>> multiple indexing scripts on different subsets of data up to the the
>> limitations of your operating system and hardware is how many do it.
>> SolrJ has some multithreaded facility, as does DataImportHandler.
>>  Distributing the indexing to multiple machines, but pointing all to the
>> same Solr server, is effectively the same as multi-threading it push
>> documents into Solr from wherever as fast as it can handle it.  This is
>> definitely how many do this.
>>
>>Erik
>>
>> On Feb 27, 2012, at 13:24 , Memory Makers wrote:
>>
>> > Hi,
>> >
>> > Is there a way to speed up indexing by increasing the number of threads
>> > doing the indexing or perhaps by distributing indexing on multiple
>> machines?
>> >
>> > Thanks.
>>
>>
>


Re: 400 Error adding field 'tags'='[a,b,c]'

2012-03-13 Thread Memory Makers
left. the $20 under the tile with the hand prints.  thx

On Tuesday, March 13, 2012, jlark  wrote:
> Interestingly I'm getting this on other fields now.
>
> I have the field stored="true"  />
>
> which is copied to text  
>
> and my text field is simply indexed="true" stored="true" />
>
> I'm feedin my test document
>
> {"url" : "TestDoc2", "title" : "another test",
"ptag":["a","b"],"name":"foo
> bar"},
>
> and when I try to feed I get.
>
> HTTP request sent, awaiting response... 400 ERROR: [doc=TestDoc2] Error
> adding field 'name'='foo bar'
>
> If I remove the field from the document though it works fine.
> I'm wondering if there is a set of reserved names that I'm using at this
> point.
>
> Jus twhish there was a way to get more helpfull error messages.
>
> Thanks for the help.
> Alp
>
>
>
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/400-Error-adding-field-tags-a-b-c-tp3823853p3824126.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>