Re: MoreLikeThis and dynamicField
Hi Lance, > You cannot give it a wildcard to go find dynamic fields; ah, ok. But it would be nice to use wildcards. You correctly guessed that I meant the wildcard querying. The dynamicField definition is not the problem ... So one can define and then the query via myfield_t would work. The termVectors is only for performance reasons, I think (it worked without this attribute). Now, regarding "stored" or "not-stored": mlt worked for me if I did: But not if I do Or do I need to configure the field in a different way? Regards, Peter. > MoreLikeThis works on any field you can name. You cannot give it a > wildcard to go find dynamic fields; no query feature can do this > (would be handy!). > > The field must have term vectors configured. In the > solr/example/solr/conf schema, the 'includes' field has this set and > you can do MLT on that. Try 'usb' as a search term for the electronics > store example in solr/example/solr. > > I don't know if the needs to be stored. Also, whether it can work on > terms in a multi-valued field. You'll have to reconfigure the 'text' > copyField target to have term vectors. > > On Fri, Jun 11, 2010 at 1:06 PM, Peter Karich wrote: > >> Hi, >> >> it seems to me that the MoreLikeThis component doesn't work for dynamic >> fields. Is that correct? >> And it also doesn't work for fields which are indexed but not stored, >> right? e.g. 'text' where dynamic fields could be copied to. >> >> Or did I create an incorrect example? >> >> Regards, >> Peter. >> >> -- >> http://karussell.wordpress.com/ >> >> >> > > > -- http://karussell.wordpress.com/
Solr and Nutch/Droids - to use or not to use?
Hello community and a nice satureday, from several discussions about Solr and Nutch, I got some questions for a virtual web-search-engine. The requirements: I. I need a scalable solution for a growing index that becomes larger than one machine can handle. If I add more hardware, I want to linear improve the performance. II. I want to use technologies like the OPIC-algorithm (default algorithm in Nutch) or PageRank or... whatever is out there to improve the ranking of the webpages. III. I want to be able to easily add more fields to my documents. Imagine one retrives information from a webpage's content, than I want to make it searchable. IV. While fetching my data, I want to make special-searches possible. For example I want to retrive pictures from a webpage and want to index picture-related content into another search-index plus I want to save a small thumbnail of the picture itself. Btw: This is (as far as I know) not possible with solr, because solr was not intended to do such special indexing-logic. V. I want to use filter queries (i.e. main-query "christopher lee" returns 1.5mio results, subquery "action" -> the main-query would be a filter-query and "action" would be the actual query. So a search within search-results would be easily made available). VI. I want to be able to use different logics for different pages. Maybe I got a pool of 100 domains that I know better than others and I got special scripts that retrive more special information from those 100 domains. Than I want to apply my special logic to those 100 domains, but every other domain should use the default logic. - The project is only virtual. So why I am asking? I want to learn more about websearch and I would like to make some new experiences. What do I know about Solr + Nutch: As it is said on lucidimagination.com, Solr + Nutch does not scale if the index is too large. The article was a little bit older and I don't know whether this problem gets fixed with the new distributed abilities of Solr. Furthermore I don't want to index the pages with nutch and reindex them with solr. The only exception would be: If the content of a webpage get's indexed by nutch, I want to use the already tokenized content of the body with some Solr copyfield operations to extend the search (i.e. making fuzzy search possible). At the moment: I don't think this is possible. I don't know much about the droids project and how well it is documented. But from what I can read by some posts of Otis, it seems to be usable as a crawler-framework. Pros for Nutch are: It is very scalable! Thanks to hadoop and MapReduce it is a scaling-monster (from what I've read). Cons: The search is not as rich as it is possible with Solr. Extend Nutch's search-abilities *seems* to be more complicated than with Solr. Furthermore, if I want to use Solr to search nutch's index, looking at my requirements I would need to reindex the whole thing - without the benefits of Hadoop. What I don't know at the moment is, how it is possible to use algorithms like in II. mentioned with Solr. I hope you understand the problem here - Solr *seems* to me as it would not be the best solution for a web-search-engine, because of scaling reasons in indexing. Where should I dive deeper? Solr + Droids? Solr + Nutch? Nutch + howToExtendNutchToMakeSearchBetter? Thanks for the discussion! - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp890640p890640.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr DataConfig / DIH Question
I'm putting together an entity. A simplified version of the database schema is below. There is a 1-[0,1] relationship between Person and Address with address_id being the nullable foreign key. If it makes any difference, I'm using SQL Server 2005 on the backend. Person [id (pk), name, address_id (fk)] Address [id (pk), zipcode] My data config looks like the one below. This naturally fails when the address_id is null since the query ends up being "select * from user.address where id = ". I've worked around it by using a config like this one. However, this makes the queries quite complex for some of my larger joins. Is there a cleaner / better way of handling these type of relationships? I've also tried to specify a default in the Solr schema, but that seems to only work after all the data is indexed which makes sense but surprised me initially. BTW, thanks for the great DIH tutorial on the wiki! Thanks! Charles
Re: MoreLikeThis and dynamicField
MLT needs term vectors. You may either store the term vectors while indexing, or store the text so that the MLT handler can pull it and analyze it. (Highlighting has the same problem.) I don't know if you still need the stored text with term vectors in the index, but storing neither will definitely not work. On Sat, Jun 12, 2010 at 1:37 AM, Peter Karich wrote: > Hi Lance, > >> You cannot give it a wildcard to go find dynamic fields; > > ah, ok. But it would be nice to use wildcards. > You correctly guessed that I meant the wildcard querying. The > dynamicField definition is not the problem ... So one can define > and > then the query via myfield_t would work. > The termVectors is only for performance reasons, I think (it worked > without this attribute). > > Now, regarding "stored" or "not-stored": mlt worked for me if I did: > > > But not if I do > > > Or do I need to configure the field in a different way? > > Regards, > Peter. > >> MoreLikeThis works on any field you can name. You cannot give it a >> wildcard to go find dynamic fields; no query feature can do this >> (would be handy!). >> >> The field must have term vectors configured. In the >> solr/example/solr/conf schema, the 'includes' field has this set and >> you can do MLT on that. Try 'usb' as a search term for the electronics >> store example in solr/example/solr. >> >> I don't know if the needs to be stored. Also, whether it can work on >> terms in a multi-valued field. You'll have to reconfigure the 'text' >> copyField target to have term vectors. >> >> On Fri, Jun 11, 2010 at 1:06 PM, Peter Karich wrote: >> >>> Hi, >>> >>> it seems to me that the MoreLikeThis component doesn't work for dynamic >>> fields. Is that correct? >>> And it also doesn't work for fields which are indexed but not stored, >>> right? e.g. 'text' where dynamic fields could be copied to. >>> >>> Or did I create an incorrect example? >>> >>> Regards, >>> Peter. >>> >>> -- >>> http://karussell.wordpress.com/ >>> >>> >>> >> >> >> > > > -- > http://karussell.wordpress.com/ > > -- Lance Norskog goks...@gmail.com
Re: Solr DataConfig / DIH Question
This is a slow way to do this; databases are capable of doing this join and feeding the results very efficiently. The 'skipDoc' feature allows you to break out of the processing chain after the first query. It is used in the wikipedia example. http://wiki.apache.org/solr/DataImportHandler On Sat, Jun 12, 2010 at 6:37 PM, Holmes, Charles V. wrote: > I'm putting together an entity. A simplified version of the database schema > is below. There is a 1-[0,1] relationship between Person and Address with > address_id being the nullable foreign key. If it makes any difference, I'm > using SQL Server 2005 on the backend. > > Person [id (pk), name, address_id (fk)] > Address [id (pk), zipcode] > > My data config looks like the one below. This naturally fails when the > address_id is null since the query ends up being "select * from user.address > where id = ". > > Query="select * from user.person"> > Query="select * from user.address where id = ${person.address_id}" > > > > I've worked around it by using a config like this one. However, this makes > the queries quite complex for some of my larger joins. > > Query="select * from user.person"> > Query="select * from user.address where id = (select address_id from > user.person where id = ${person.id})"> > > > > Is there a cleaner / better way of handling these type of relationships? > I've also tried to specify a default in the Solr schema, but that seems to > only work after all the data is indexed which makes sense but surprised me > initially. BTW, thanks for the great DIH tutorial on the wiki! > > Thanks! > Charles > -- Lance Norskog goks...@gmail.com
how to patch?
I want to use fast highlighter in solr1.4 and find a issue in https://issues.apache.org/jira/browse/SOLR-1268 File Name Date Attached ↑ Attached By Size SOLR-1268.patch 2010-02-05 10:32 PM Koji Sekiguchi4 kB SOLR-1268-0_fragsize.patch 2010-02-04 10:43 PM Koji Sekiguchi2 kB SOLR-1268-0_fragsize.patch 2010-01-29 11:51 PM Koji Sekiguchi1 kB SOLR-1268.patch 2010-01-03 11:27 PM Koji Sekiguchi 48 kB SOLR-1268.patch 2010-01-02 10:42 PM Koji Sekiguchi 35 kB I am not familiar with patch. There are 3 patch named SOLR-1268, which one shoud I use? I guess I should the newest one but it's so small. SOLR-1268-0_fragsize also have 2 files
Re: Dynamic dataConfig files in DIH
On Fri, Jun 11, 2010 at 11:13 PM, Chris Hostetter wrote: > > : Is there a way to dynamically point which dataConfig file to use to > import > : using DIH without using the defaults hardcoded in solrconfig.xml? > > what do you mean by "dynamically" ? ... it's a query param, so you can > specify the file name in the url when you issue the command. > > not it is not. it is not reloaded for every request. We should enhance dih to do so. But the whole data-config file can be sent as a request param and it works (this is used by the dih debug mode) > > -Hoss > > -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Solr DataConfig / DIH Question
this looks like a common problem. I guess DIH should handle this more gracefully. Instead of firing a query and failing it should not fire a query if any of the values are missing . This can b made configurable if needed On Sun, Jun 13, 2010 at 9:14 AM, Lance Norskog wrote: > This is a slow way to do this; databases are capable of doing this > join and feeding the results very efficiently. > > The 'skipDoc' feature allows you to break out of the processing chain > after the first query. It is used in the wikipedia example. > > http://wiki.apache.org/solr/DataImportHandler > > On Sat, Jun 12, 2010 at 6:37 PM, Holmes, Charles V. > wrote: > > I'm putting together an entity. A simplified version of the database > schema is below. There is a 1-[0,1] relationship between Person and Address > with address_id being the nullable foreign key. If it makes any difference, > I'm using SQL Server 2005 on the backend. > > > > Person [id (pk), name, address_id (fk)] > > Address [id (pk), zipcode] > > > > My data config looks like the one below. This naturally fails when the > address_id is null since the query ends up being "select * from user.address > where id = ". > > > > >Query="select * from user.person"> > > > Query="select * from user.address where id = > ${person.address_id}" > > > > > > > > I've worked around it by using a config like this one. However, this > makes the queries quite complex for some of my larger joins. > > > > >Query="select * from user.person"> > > > Query="select * from user.address where id = (select address_id > from user.person where id = ${person.id})"> > > > > > > > > Is there a cleaner / better way of handling these type of relationships? > I've also tried to specify a default in the Solr schema, but that seems to > only work after all the data is indexed which makes sense but surprised me > initially. BTW, thanks for the great DIH tutorial on the wiki! > > > > Thanks! > > Charles > > > > > > -- > Lance Norskog > goks...@gmail.com > -- - Noble Paul | Systems Architect| AOL | http://aol.com