Re: Entity extraction?
Solr can do a simple facet seach like FAST, but the entity extraction demands other tecnologies. I do not know how FAST does it but at the company I´m working on (www.cortex-intelligence.com), we use a mix of statistical and language-specific tasks to recognize and categorize entities in the text. Ling Pipe is another tool (free) that does that too. In case you would like to see a simple demo: http://www.cortex-intelligence.com/tech/ Rossini On Fri, Oct 24, 2008 at 6:18 PM, Charlie Jackson <[EMAIL PROTECTED] > wrote: > During a recent sales pitch to my company by FAST, they mentioned entity > extraction. I'd never heard of it before, but they described it as > basically recognizing people/places/things in documents being indexed > and then being able to do faceting on this data at query time. Does > anything like this already exist in SOLR? If not, I'm not opposed to > developing it myself, but I could use some pointers on where to start. > > > > Thanks, > > - Charlie > >
Re: Entity extraction?
Well... IMHO that depends. One of the services we provide is a "automatic clipping" in which our client chooses 20~30 texts from the media he woud like to be aware. With classification algorithms we then keep him aware of every new text of his interest. We gained about 10% of precision just by adding EE information to the algorithm. Rossini On Mon, Oct 27, 2008 at 2:17 PM, Walter Underwood <[EMAIL PROTECTED]>wrote: > The vendor mentioned entity extraction, but that doesn't mean you need it. > Entity extraction is a pretty specific technology, and it has been a > money-losing product at many companies for many years, going back to > Xerox ThingFinder well over ten years ago. > > My guess is that very few people really need entity extraction. > > Using EE for automatic taxonomy generation is even harder to get right. > At best, that is a way to get a starter set of categories that you can > edit. You will not get a production quality taxonomy automatically. > > wunder > > On 10/27/08 8:31 AM, "Charlie Jackson" <[EMAIL PROTECTED]> wrote: > > > True, though I may be able to convince the powers that be that it's worth > the > > investment. > > > > There are a number of open source or free tools listed on the Wikipedia > entry > > for entity extraction > > ( > http://en.wikipedia.org/wiki/Named_entity_recognition#Open_source_or_free) > -- > > does anyone have any experience with any of these? > > > > > > Charlie Jackson > > 312-873-6537 > > [EMAIL PROTECTED] > > > > -Original Message- > > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > > Sent: Monday, October 27, 2008 10:23 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Entity extraction? > > > > For the record, LingPipe is not free. It's good, but it's not free. > > > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > - Original Message > >> From: Rafael Rossini <[EMAIL PROTECTED]> > >> To: solr-user@lucene.apache.org > >> Sent: Friday, October 24, 2008 6:08:14 PM > >> Subject: Re: Entity extraction? > >> > >> Solr can do a simple facet seach like FAST, but the entity extraction > >> demands other tecnologies. I do not know how FAST does it but at the > company > >> I´m working on (www.cortex-intelligence.com), we use a mix of > statistical > >> and language-specific tasks to recognize and categorize entities in the > >> text. Ling Pipe is another tool (free) that does that too. In case you > would > >> like to see a simple demo: http://www.cortex-intelligence.com/tech/ > >> > >> Rossini > >> > >> > >> On Fri, Oct 24, 2008 at 6:18 PM, Charlie Jackson > >>> wrote: > >> > >>> During a recent sales pitch to my company by FAST, they mentioned > entity > >>> extraction. I'd never heard of it before, but they described it as > >>> basically recognizing people/places/things in documents being indexed > >>> and then being able to do faceting on this data at query time. Does > >>> anything like this already exist in SOLR? If not, I'm not opposed to > >>> developing it myself, but I could use some pointers on where to start. > >>> > >>> > >>> > >>> Thanks, > >>> > >>> - Charlie > >>> > >>> > > > > > > > >
ArrayIndexOutOfBoundsException on TermScorer
Hello all, In one simple query on my index "http://localhost:8983/solr/select/?q=brasilI get this: 1226511 java.lang.ArrayIndexOutOfBoundsException: 1226511 at org.apache.lucene.search.TermScorer.score(TermScorer.java:74) at org.apache.lucene.search.TermScorer.score(TermScorer.java:61) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:146) at org.apache.lucene.search.Searcher.search(Searcher.java:118) at org.apache.lucene.search.Searcher.search(Searcher.java:97) at org.apache.solr.search.SolrIndexSearcher.getDocListNC( SolrIndexSearcher.java:888) at org.apache.solr.search.SolrIndexSearcher.getDocListC( SolrIndexSearcher.java:805) at org.apache.solr.search.SolrIndexSearcher.getDocList( SolrIndexSearcher.java:698) at com.cortex.solr.handler.StandardRequestHandler.handleRequestBody( StandardRequestHandler.java:151) at org.apache.solr.handler.RequestHandlerBase.handleRequest( RequestHandlerBase.java:77) at org.apache.solr.core.SolrCore.execute(SolrCore.java:659) at org.apache.solr.servlet.SolrDispatchFilter.execute( SolrDispatchFilter.java:193) at org.apache.solr.servlet.SolrDispatchFilter.doFilter( SolrDispatchFilter.java:161) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter( ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java :216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle( ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java :114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete( HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java :368) at org.mortbay.thread.BoundedThreadPool$PoolThread.run( BoundedThreadPool.java:442) Does anyone have a clue about what is the problem? in the lucene´s TermScorer.class the exception i´m getting is in this line: score *= normDecoder[norms[doc] & 0xFF]; // normalize for field Thanks for any help
olap with solr (math operations on facets)
Hi all, I´m considering on doing something like a "light-weight olap" server with lucene/solr. To achieve that I´d have to do some math operantions on facets. Is that possible? For example, my documents would be a purchase row, like (id, value, id_department, id_store, id_region ...). If I did a facet query for id_deparment the server would return me something like: deparment1: 500, deparment2: 400... Is it possible to get the sum, or avg or any math operation on the field value? Than the server would return me: deparment1: 100 (the sum of each value) Is it clear? []s Rossini
Re: olap with solr (math operations on facets)
Thanks for the reply Mike. Is there any plans on doing some like this? Or some direction anyone could give? []s Rossini On 9/21/07, Mike Klaas <[EMAIL PROTECTED]> wrote: > > On 21-Sep-07, at 8:27 AM, Rafael Rossini wrote: > > > Hi all, > > > > I´m considering on doing something like a "light-weight olap" > > server with > > lucene/solr. To achieve that I´d have to do some math operantions > > on facets. > > Is that possible? > > For example, my documents would be a purchase row, like (id, > > value, id_department, id_store, id_region ...). If I did a facet > > query for > > id_deparment the server would return me something like: deparment1: > > 500, > > deparment2: 400... Is it possible to get the sum, or avg or any math > > operation on the field value? Than the server would return me: > > deparment1: > > 100 (the sum of each value) Is it clear? > > Currently this is not possible out of the box with Solr. > > -Mike
Re: olap with solr (math operations on facets)
Thanks for the tip, I´ll look at it []s Rossini On 9/21/07, Mike Klaas <[EMAIL PROTECTED]> wrote: > > On 21-Sep-07, at 2:42 PM, Rafael Rossini wrote: > > > Thanks for the reply Mike. Is there any plans on doing some like > > this? Or > > some direction anyone could give? > > Probably the easiest thing to do is write a custom request handlers > that iterates over the field cache and computes the statistics you > want (loading the docs would probably be too slow). > > Check out SimpleFacets.java to see how it uses the FieldCache. > > -Mike >
Re: solr+hadoop = next solr
Hi, Jeff and Mike. Would you mind telling us about the architecture of your solutions a little bit? Mike, you said that you implemented a highly-distributed search engine using Solr as indexing nodes. What does that mean? You guys implemented a master, multi-slave solution for replication? Or the whole index shards for high availability and fail over? On 6/7/07, Jeff Rodenburg <[EMAIL PROTECTED]> wrote: Mike - thanks for the comments. Some responses added below. On 6/7/07, Mike Klaas <[EMAIL PROTECTED]> wrote: > > > I've implemented a highly-distributed search engine using Solr (200m > docs and growing, 60+ servers). It is not a Solr-based solution in > the vein of FederatedSearch--it is a higher-level architecture that > uses Solr as indexing nodes. I'll note that it is a lot of work and > would be even more work to develop in the generic extensible > philosophy that Solr espouses. Yeah, we've done the same thing in the .Net world, and it's a tough slog. We're in the same situation -- making our solution generically extensible is pretty much a non-starter. > In terms of the FederatedSearch wiki entry (updated last year), has > > there > > been any progress made this year on this topic, at least something > > worthy of > > being added or updated to the wiki page? Not to splinter efforts > > here, but > > maybe a working group that was focused on that topic could help to > > move > > things forward a bit. > > I don't believe that absence of organization has been the cause of > lack of forward progress on this issue, but simply that there has > been no-one sufficiently interested and committed to prioritizing > this huge task to work on it. There is no need to form a working > group (not when there are only a handful of active committers to > begin with)--all interested people could just use solr-dev@ for > discussion. That makes sense, just didn't want to bombard the list with the subject if it was a detractor from the core project, i.e. keep lucene messages on lucene, solr messages on solr, etc. The good-community-participant approach, if you will. Solr is an open-source project, so huge features will get implemented > when there is a person or group of people devoted to leading the > charge on the issue. If you're interested in being that person, > that's great! > > Glad to jump in, not sure I qualify as such for that, but certainly a big cheerleader nonetheless.
Re: multiple indices
I have 3 different instances of solr on jetty 6.1.13, but you need the jetty plus. my etc/jetty.xml looks like this * /webapps/solr1* */solr1* /etc/webdefault.xml solr/home override this value * /webapps/solr2* */solr2* /etc/webdefault.xml solr/home override this value then, on the webapps/solr1/WEB-INF you need a jetty-env.xml like this: http://jetty.mortbay.org/configure.dtd";> solr/home /solr1 Hope it helps On 6/26/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Hm, that JNDI again... this makes it sound like SOLR-215 is completely superfluous? I have not configured Jetty this way yet, but I do see some docs on http://wiki.apache.org/solr/SolrJetty . Interestingly, the configs look a lot different than what's described on http://docs.codehaus.org/display/JETTY/JNDI . I also remember Jetty Plus from a while back, but now I cannot find any information about Jetty Plus 6.*, only 5 - http://jetty.mortbay.org/jetty5/plus/index.html . Otis - Original Message From: Chris Hostetter <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, June 26, 2007 8:10:46 PM Subject: Re: multiple indices : I have multiple applications (blogs/forums/video/etc) - each of these : is independent (no need to perform queries on multiple indices). : Would it be best to use multiple instances of SOLR/JVM - one for each : index or use a solution where only one JVM instance is running (maybe : solr-215?)? you don't actaully need multiple JVM instances to run multiple Solr instance ... you can configure your ServletContainer to run the solr.war in multiple contexts each of which has a differnet solrconfig.xml and schema.xml (using JNDI) ... that way you get most of hte benefits of isolated instances but also can also take advantage of a single large heap and common connection management. -Hoss