Re: Calculating tf-idf

2016-01-29 Thread Péter Király
I found the solution: https://wiki.apache.org/solr/TermVectorComponent. I did not know that before, but that's exactly what I need. Regards, Péter 2016-01-29 16:09 GMT+01:00 Péter Király : > Dear all, > > I am working on a research project in which I create an OS tool which >

Calculating tf-idf

2016-01-29 Thread Péter Király
either in Solr, or with the Lucene API? Thank you very much in advance! Péter -- Péter Király software developer GWDG, Göttingen - Europeana - eXtensible Catalog - The Code4Lib Journal http://linkedin.com/in/peterkiraly

Re: Pls help: Very long query - what to do?

2012-11-21 Thread Péter Király
Hi, you have to set maxHttpHeaderSize of the element in server.xml. The default is something about 8K. See it with more detail: http://serverfault.com/questions/56691/whats-the-maximum-url-length-in-tomcat Regards, Péter Király portal backend developer http://europeana.eu 2012/11/21 uwe72

Re: Solr 4.0 admin panel

2012-10-31 Thread Péter Király
ent.HttpShardHandlerFactory getParameter > INFO: Setting sizeOfQueue to: -1 > Oct 31, 2012 11:44:47 AM > org.apache.solr.handler.component.HttpShardHandlerFactory getParameter > INFO: Setting fairnessPolicy to: false > Oct 31, 2012 11:44:47 AM org.apache.solr.client.solrj.impl.HttpClientUti

Re: [ANNOUNCE] Apache Solr 4.0 released.

2012-10-12 Thread Péter Király
* Numerous bug fixes and optimizations. > > Please report any feedback to the mailing lists > (http://lucene.apache.org/solr/discussion.html) > > Note: The Apache Software Foundation uses an extensive mirroring > network for distributing releases. It is possible that the mirr

Re: Semantic document format... standards?

2012-09-11 Thread Péter Király
> > > > Thanks, > > Otis > > > > Performance Monitoring - Solr - ElasticSearch - HBase - > http://sematext.com/spm > > > > Search Analytics - http://sematext.com/search-analytics/index.html > On Tue, Sep 11, 2012 at 11:51 AM, Otis Gospodnetic -- Péter Király eXtensible Catalog http://eXtensibleCatalog.org http://drupal.org/project/xc

Re: simple query help

2012-05-15 Thread Péter Király
> I get 1 document returned (doc A) >> >> If I search >> q=skcode:2021049 and ent_no:1040970907 >> >> I get 1 document returned (doc B) >> >> >> But if I search >> q=skcode:2021051 and flength:368.0 or skcode:2021049 and ent_no:1040970907 >>

Re: XPath with ExtractingRequestHandler

2011-12-15 Thread Péter Király
class being used does not support the '//' > syntax. > > Is there anyway to configure Tika to use a different XPath evaluation class? > > -- Péter Király eXtensible Catalog http://eXtensibleCatalog.org http://drupal.org/project/xc

Re: java.net.SocketException: Too many open files

2011-10-25 Thread Péter Király
e connection pool in dbcp etc.. >> >> I am not experienced on java so please help to resolved this problem. >> >>  solr version: 3.4 >> >> regards >> Jonty > -- Péter Király eXtensible Catalog http://eXtensibleCatalog.org http://drupal.org/project/xc

Re: SOLR 3.3.0 multivalued field sort problem

2011-08-12 Thread Péter Király
Hi, There is no direct solution, you have to create single value field(s) to create search. I am aware of two workarounds: - you can use a random or a given (e.g. the first) instance of the multiple values of the field, and that would be your sortable field. - you can create two sortable fields:

Re: SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.ICUTokenizerFactory'

2011-08-12 Thread Péter Király
Hi Satish, > : > > I also added the following files to my apache-solr-3.3.0\example\lib > : > folder: I use ICU, and I copied the jar files not into example/lib as you did, but example/solr/lib. First I had to create that directory. It works for me both under 3.1, 3.2 and 3.3. In multicore setup

Re: How to query solr status

2011-07-26 Thread Péter Király
You can use Luke request handler, but for improving the speed set numTerms parameters to zero, like http://localhost:8983/solr/admin/luke?numTerms=0 It will give you information about optimized state of index as true More about this on Solr wiki: http://wiki.apache.org/solr/LukeRequestHandler 201

Re: How to find whether solr server is running or not

2011-07-19 Thread Péter Király
You can use ping: http://host:port/solr/admin/ping The response is something like this: 05all10allsolrpingquerysearchOK or with JSON response: http://host:port/solr/admin/ping?wt=json {"responseHeader":{"status":0,"QTime":2,"params":{"echoParams":"all","rows":"10","echoParams":"all","q":"solrp

Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Péter Király
> [ ]  I always use the JDK logging as bundled in solr.war, that's perfect > [x]  I sometimes use log4j or another framework and am happy with > re-packaging solr.war > [ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at > deploy time > [ ]  Let me choose whether to bundle a

Re: Indexing Flickr and Panaramio

2011-04-12 Thread Péter Király
Hi, I did Flickr into Lucene about 3 years ago. There is a Flickr API, which covers almost everything you need (as I remember, not always Flickr feature was implemented at that time in the API, like the "collection" was not searchable). You can harvest by user ID or searching for a topic. You can

Re: Transform a SolrDocument into a SolrInputDocument

2011-03-21 Thread Péter Király
Hi Marc, as far as I know the best way to do it is working from the original source, because it is possible, that not all fields are stores, and the original content of the not stored fields is not inside the Solr document. Péter 2011/3/21 Marc SCHNEIDER : > Hello, > > I'd like to know the faste

Re: OAI on SOLR already done?

2011-02-02 Thread Péter Király
oolge matches. > > paul > > > Le 2 févr. 2011 à 20:46, Péter Király a écrit : > >> Hi, >> >> I don't know whether it fits to your need, but we are builing a tool >> based on Drupal (eXtensible Catalog Drupal Toolkit), which can harvest >> with OA

Re: OAI on SOLR already done?

2011-02-02 Thread Péter Király
Hi, I don't know whether it fits to your need, but we are builing a tool based on Drupal (eXtensible Catalog Drupal Toolkit), which can harvest with OAI-PMH and index the harvested records into Solr. The records is harvested, processed, and stored into MySQL, then we index them into Solr. We creat

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-19 Thread Péter Király
> [x] ASF Mirrors (linked in our release announcements or via the Lucene > website) > [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) > [x] I/we build them from source via an SVN/Git checkout. > I rarely build, only if I would like to try an interesting patch. > [] Other (someon

Re: PHP PECL solr API library

2011-01-09 Thread Péter Király
I have made some speed test, and this library is slightly (10-20%) quicker than the library writen in pure PHP. I haven't compared the API. Király Péter http://eXtensibleCatalog.org 2011/1/10 Lukas Kahwe Smith : > > On 10.01.2011, at 08:16, Dennis Gearon wrote: > >> Anyone have any experience usi

Re: SOLR Thesaurus

2010-12-10 Thread Péter Király
Hi Chris, thanks for your description. I should think about this a little bit more, then I will ask some details. The main problem is that Synonyms are one kind of relations, and Thesaurus may contain 6-10 kinds of relations. And it is depending on the user, which types of relations he would like

Re: SOLR Thesaurus

2010-12-10 Thread Péter Király
Hi Lee, according to my vision the user could decide which relationship types would he likes to attach to his search, and the application would call his attention to other possibilities. So there would be no heuristic method applied, because e.g. boarder terms would cause lots of misleading result

Re: SOLR Thesaurus

2010-12-10 Thread Péter Király
I also try to define the problem. In the library world there are some general and special thesaurus, which reveal the relations between concepts. The relations have types as Lee described: Prefered Term (PT), Broader Terms (BT), Narrower Terms (NT) Related Terms (RT) and others. Some of these thes

Re: How badly does NTFS file fragmentation impact search performance? 1.1X? 10X? 100X?

2010-12-08 Thread Péter Király
Hi Will, I could not answer you exact numbers, but yes, defragmentation in Windows is important, and it will speed up searches. I guess, that the ratio is determined by the number of file fragments. In Win environment I regularly run defragmentation, and usually I use drives for Lucene/Solr index

Re: LuceneRevolution - NoSQL: A comparison

2010-10-13 Thread Péter Király
2010/10/12 Peter Keegan : > I listened with great interest to Grant's presentation of the NoSQL > comparisons/alternatives to Solr/Lucene. My question: will this presentation be available somewhere? I do not find any presentation material nn the conference web site. Király Péter http://eXtensible

Re: trie

2010-09-21 Thread Péter Király
You can read about it in Lucene in Action second edition. Péter 2010/9/21 Papp Richard : >  is there any good tutorial how to use and what is trie? what I found on the > net is really blurry. > > rgeards, >  Rich > > > __ Information from ESET NOD32 Antivirus, version of virus signature >