Re: URI is too long

2016-01-31 Thread Paul Libbrecht
How about using POST? paul > Salman Ansari > 31 January 2016 at 15:20 > Hi, > > I am building a long query containing multiple ORs between query terms. I > started to receive the following exception: > > The remote server returned an error: (414) Request-URI Too L

Re: Highlight brings the content from the first pages of pdf

2016-02-14 Thread Paul Libbrecht
This looks like the stored content is shortened. Can it be? Can you see that inside the docs? paul > Evert R. > 14 February 2016 at 11:26 > Hi There, > > I have a situation where started a techproducts, without any modification, > post a pdf file. When searching as:

pre-loaded function-query?

2015-08-18 Thread Paul Libbrecht
Hello Solr experts, I'm writing a "query expansion" QueryComponent which takes web-app parameters (e.g. profile information) and turns them into a solr query. Thus far I've used lucene TermQuery-ies with success. Now, I would like to use something a bit more elaborate. Either I write it with qui

Re: pre-loaded function-query?

2015-08-18 Thread Paul Libbrecht
Doug Turnbull wrote: > I'm not sure if you mean organizing function queries under the hood in a > query component or externally. > > Externally, I've always followed John Berryman's great advice for working > with Solr when dealing with complex/reusable function queries and boosts > http://opensour

Re: Strange interpretation of invalid ISO date strings

2015-09-06 Thread Paul Libbrecht
Just a word of warning: iso-8601, the date format standard, is quite big, to say the least, and I thus expect very few implementations to be complete.  I survived one such interoperability issue with Safari on iOS6. While they (and JS I think) claim iso-8601, it was not complete and fine grained

Re: Ideas

2015-09-21 Thread Paul Libbrecht
Writing a query component would be pretty easy or? It would throw an exception if crazy numbers are requested... I can provide a simple example of a maven project for a query component. Paul William Bell wrote: > We have some Denial of service attacks on our web site. SOLR threads are > going c

Re: Instant Page Previews

2015-10-08 Thread Paul Libbrecht
This is a very nice start Charlie, I'd warn a bit however, on the value of such previews: automated previews of web-page can be quite far from what users might be remembering a page should look like. In particular all tool pages typically show quite "empty" or "initial" state in such automatic pre

Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Paul Libbrecht
I believe that very many installations of solr actually need a query expansion such as the one you describe below with an indexing of each textual fields in multiple forms (string, straight (whitespace/ideaograms), stemmed, phonetic). Thanks to edismax, I think, you would do the following expansi

Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Paul Libbrecht
Alexandre, I guess you are talking about that post: http://lucidworks.com/blog/2015/06/06/query-autofiltering-extended-language-logic-search/ I think it is very often impossible to solve properly. Words such as "direction" have very many meanings and would come in different fields. In IMDB, wo

Re: Boosting a document score when advertised! Please help!

2015-11-05 Thread Paul Libbrecht
Alessandro, none of them seem to match what I'd expect be done: given an extra param that indicates the author, for each query, add an extra boosting. Christian, I used to do that with a query component (in java) but I think that nowadays you can do that with the bq parameter of edismax. paul

Re: Arabic analyser

2015-11-10 Thread Paul Libbrecht
Mahmoud, there is an arabic analyzer: https://wiki.apache.org/solr/LanguageAnalysis#Arabic doesn't it do what you describe? Synonyms probably work there too. Paul > Mahmoud Almokadem > 9 novembre 2015 17:47 > Thanks Jack, > > This is a good solution, but we have

Re: Indexing Wikipedia

2015-12-04 Thread Paul Libbrecht
SImply... some fields are not stored so they are only searched through (being indexed) but not given back? (title and text in the tutorial you refer to). Are these the missing fields? Paul > Kate Kas > 5 décembre 2015 00:23 > Hi, > > i tried to index .xml files from wi

require diversity in results?

2015-04-24 Thread Paul Libbrecht
Hello list, I'm wondering if there could extra parameters or query operators that where I could impose that sorting by relevance should be relaxed so that there's a minimum diversity in some fields in the first page of results. For example, I'd like the search results to contain at least three po

Re: Anybody uses Solr JMX?

2014-05-04 Thread Paul Libbrecht
Also, Zabbix and Nagios does read from JMX. Zabbix has a "prototype" for SOLR which is a simple way to gather an amount of data from solr and do, for example, archiving and plotting of cache values. paul Le 5 mai 2014 à 04:37, Ahmet Arslan a écrit : > Hi, > > It looks like JMX is a standard

Re: Anybody uses Solr JMX?

2014-05-04 Thread Paul Libbrecht
> Thank you everybody for the links and explanations. > > I am still curious whether JMX exposes more details than the Admin UI? > I am thinking of a troubleshooting context, rather than long-term > monitoring one. JMX is multi-purpose. So, in principle, it can offer considerably more. I've seen

Re: Is it possible to cluster on search results but return only clusters?

2014-05-06 Thread Paul Libbrecht
put rows to zero? Exploit the facets as "clusters" ? paul Le 6 mai 2014 à 16:42, Sebastián Ramírez a écrit : > I have this query / URL > > http://example.com:8983/solr/collection1/clustering?q=%28title:%22+Atlantis%22+~100+OR+content:%22+Atlantis%22+~100%29&rows=3001&carrot.snippet=content&c

Re: Website running Solr

2014-05-11 Thread Paul Libbrecht
Not with certainty as solr may be working far behind another set of tools that make queries (and nothing licensing prevents it). If you get a software that has maybe Solr inside, I think the credits section should include a mention of some sort. However, there may be hints if a website uses solr,

Re: Solr on S3FileSystem, Kosmos, GlusterFS, etc….

2014-06-24 Thread Paul Libbrecht
I've always been under the impression that file-system-access-speed is crucial for Lucene-based storage and have always advocated to not use NFS for that (for which we had slowness of a factor of 5 approximately). Has there any performance measurement made for such a setting? Is FS-caching sudde

Re: multilingual search

2014-07-04 Thread Paul Libbrecht
To do just what Jack described, I often write a solr query component that does "query expansion". Based on some parameters I can recognize to be a language hint (e.g. the language of the environment they search in, the browser's accept-language) I reformulate the query into a query in the fields

Re: multilingual search

2014-07-04 Thread Paul Libbrecht
> 1. Modify the qf parameter directly by either adding the "_xx" language > suffix to each field in qf, or replacing the "xx" for any qf fields that > already have an "_xx" suffix. > 2. Have separate "qf_xx" parameters which are customized for specific > languages and then copy the language-spec

Re: Character encoding problems

2014-07-29 Thread Paul Libbrecht
> If you are seeing " appelé au téléphone" in the browser, I would guess > that the data is being rendered in UTF-8 by your server and the content type > of the html is set to iso-8859-1 or not being set and your browser is > defaulting to iso-8859-1. > > You can force the encoding to utf-8

Re: Anybody uses Solr JMX?

2014-08-06 Thread Paul Libbrecht
Hello Otis, this looks like an excellent idea! I'm in need of that, erm… last week and probably this one too. Is there not a risk that reading certain JMX properties actually hogs the process? (or is it by design that MBeans are supposed to be read without any lock effect?). thanks for the hin

Re: what os env you use to develop lucene or solr?

2014-08-11 Thread Paul Libbrecht
I use MacOSX for development since more than 10 years. It's, by far, the user-friendliest Unix-based system. So copy and paste works "correctly" from the terminal to the IDE. Find in the terminal is nicely behaving (really!). This is kilometers away from XWindows' terminals and megameters away fro

Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Paul Libbrecht
I personally felt Tomcat to be in a more appropriate community, that of the Apache Foundation, than Jetty. Also, jetty always has been striving for simplicity and that's really not always what you intend to when you plan an app-server. E.g. features such as the manager or mod_ajp appeared import

Re: weak documents

2013-11-27 Thread Paul Libbrecht
Thomas, our experience with Curriki.org is that evaluating what I call the "related documents" is a procedure that needs access to the complete content and thus is run at the DB level and no thte sold-level. For example, if a user changes a part of its name, we need to reindex all of his resou

Re: How solr text search finding work

2013-11-28 Thread Paul Libbrecht
Viresh, there's two ways to solve this. - Using the CompoundWordsAnalyzer. I still haven't been able to find an easy to embark method into there. That would decompose, at indexing and query time, the term Kreditgeber into kredit and geber. For a higher precision, you probably want to do it at

Re: Call to Solr via TCP

2013-12-10 Thread Paul Libbrecht
Zwer, I think it may be a bit dangerous as jetty may start to do some connection management and expect the client to do so. However, if you look into http/1.0 you have a little chance that doing simple http calls is as simple as socket connections. What could be the reason not to use a decent h

Re: Chaining plugins

2013-12-26 Thread Paul Libbrecht
I have subclassed the query component to do so. Using params, you can get almost everything thinkable that is not too much documented. paul On 26 déc. 2013, at 15:59, elmerfudd wrote: > I would like to develope a search handler that is doing some logic and then > just sends the query to the de

Re: Remove stemming without reindexing - currently using KStem

2014-02-02 Thread Paul Libbrecht
Abhishek, stemming is applied before the tokens get into the index. Changing the stemming of the indexer cannot be done without reindexing. paul Le 2 févr. 2014 à 06:23, "abhishek jain" a écrit : > Hi Friends, > > Is it possible to remove stemming without having to reindex the entire data, >

Re: Searching phonetic by DoubleMetaphone soundex encoder

2014-02-12 Thread Paul Libbrecht
Navaa, you need query expansion for that. E.g. if your query goes through dismax, you need to add the two field names to the qf parameter. The nice thing is that qf can be: text^3.0 test.stemmed^2 text.phonetic^1 And thus exact matches are preferred to stemmed or phonetic matches. This is conf

Re: Searching phonetic by DoubleMetaphone soundex encoder

2014-02-12 Thread Paul Libbrecht
Navaa, You need the query to be sent to the two fields. In dismax, this is easy. Paul On 12 février 2014 14:22:33 HNEC, Navaa wrote: >Hi, >I am using solr for searching phoneticly equivalent string >my schema contains... >positionIncrementGap="100"> > > >

Re: How to implement multilingual word components fields schema?

2014-09-09 Thread Paul Libbrecht
Ilia, one aspect you surely loose with a single field approach is the differentiation of semantic fields in different languages for words that sounds the same. The words "sitting" and "directions" are easy example that have fully different semantics in French and English, at least. "directions"

Re: Facets not supporting multi language?

2014-09-11 Thread Paul Libbrecht
The way this is done in drupal and probably many others is that the facet fields are keywords from a taxonomy. If you want to facet through single language, you probably want to separate the fields where you index each of the languages (so a field "text-en", "text-ft" through which you would fac

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Paul Libbrecht
Hello Koji, how would you compare that to SemanticVectors? paul On 20 nov. 2014, at 10:10, Koji Sekiguchi wrote: > Hello, > > It's my pleasure to share that I have an interesting tool "word2vec for > Lucene" > available at https://github.com/kojisekig/word2vec-lucene . > > As you can imagin

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Paul Libbrecht
t; operations vector('Paris') - vector('France') + vector('Italy') results in a > vector that is very > close to vector('Rome'), and vector('king') - vector('man') + vector('woman') > is close to > vector('

Re: comparing feature vectors using Solr/Lucene

2014-11-26 Thread Paul Libbrecht
Upayavira, on the lucene list, two tools are sometimes talked about which might be doing some of what you are searching: - semanticvectors (https://code.google.com/p/semanticvectors) - word2vec https://github.com/kojisekig/word2vec-lucene/i Maybe it helps? I'm under the impression that you are ra

Re: Differentiate direction.

2014-12-18 Thread Paul Libbrecht
Kind of depends on how you're going to query. If you're going to query always with a direction then, you can probably prefix all tokens with the direction. If you're going to query always simple text bits, then using phrase search with d1 and d2 being words might also work. If you're going for fu

Re: searching both english and japanese

2013-07-07 Thread Paul Libbrecht
Shalom, isn't the StandardAnalyzer supposed to take care of "forking" in case of ideograms? I.e. use a Japanese-friendly analyzer for japanese characters and an English-friendly analyzer otherwise. As Jack pointed out, edismax is nifty to expand a query on multiple fields. If you need to do mor

Re: searching both english and japanese

2013-07-07 Thread Paul Libbrecht
Shalom, isn't the StandardAnalyzer supposed to take care of "forking" in case of ideograms? I.e. use a Japanese-friendly analyzer for japanese characters and an English-friendly analyzer otherwise. As Jack pointed out, edismax is nifty to expand a query on multiple fields. If you need to do mor

Re: Where is the webapps directory of servlet container?

2013-08-17 Thread Paul Libbrecht
>> What should I do? > Can you help make me understand the work flow? Kamaljeet, in most servlet-containers (e.g. Tomcat or Jetty), there is such a directory called webapps. In Sun Java App Server it is inside domains//applications/j2ee-modules/. Maybe it helps? If not, please indicate the ser

Re: Where is the webapps directory of servlet container?

2013-08-17 Thread Paul Libbrecht
>> Maybe it helps? >> If not, please indicate the servlet container you chose. > > I have installed java and solr 4.4.0. I guess I need to install Jetty > or Tomat. Not able to decide among both. But tried with Jetty. Is it > necessary to add new user to use Jetty?? Jetty comes bundled with Solr.

Re: Where is the webapps directory of servlet container?

2013-08-17 Thread Paul Libbrecht
> Are they refering to solr-4.4.0/example/webapps directory here? > https://cwiki.apache.org/confluence/display/solr/Installing+Solr > But solr.war is already placed there. Is it fine? I believe it is intended to be fine indeed ;-). However any other installation with a webapps directory would be

Re: More on topic of Meta-search/Federated Search with Solr

2013-08-26 Thread Paul Libbrecht
Why not simply create a meta search engine that indexes everything of each of the nodes.? (I think one calls this harvesting) I believe that this the way to avoid all sorts of performance bottleneck. As far as I could analyze, the performance of a federated search is the performance of the leas

Re: More on topic of Meta-search/Federated Search with Solr

2013-08-26 Thread Paul Libbrecht
inged on the lack > of Federated Search. I do not have the hubris to think I can fix that, and > it is not really my role to try, but something that works without > Harvesting and local indexing is obviously desirable to Enterprise Search > users. > > > > On Mon, Aug

Re: More on topic of Meta-search/Federated Search with Solr

2013-09-05 Thread Paul Libbrecht
Hello list, A student of a friend of mine made his masters on that topic, especially about federated ranking. I have copied his text here: http://direct.hoplahup.net/tmp/FederatedRanking-Koblischke-2009.pdf Feel free to contact me to contact Robert Koblischke for questions. Pa

Re: Book text with chapter line number

2013-04-24 Thread Paul Libbrecht
It's easy to then store a map of "term position" to line-number and page-number along with each paragraph, or? Paul On 24 avr. 2013, at 16:24, Timothy Potter wrote: > Chapter seems too broad and line seems too narrow -- have you thought > about paragraph level? Something like: > > docID, book

Re: Good Desktop Search?

2013-05-03 Thread Paul Libbrecht
Savia, maybe not very mature yet, but someone on java-us...@lucene.apache.org announced such a tool the other day. I'm copying it below. I do not know of many otherwise. paul > Hi everybody, > just a simple question > is there any solr/lucene based desktop search project around someone might

Re: Solr + Groovy

2013-06-03 Thread Paul Libbrecht
Achim, have you considered the velocity-response-writer? Together with a thread-local I've been able to turn this thing into a jsp carrier so as to use JSPs as views for solr results. paul On 3 juin 2013, at 20:31, Achim Domma wrote: > Looks interesting, but it's just for the UpdateHandler. R

Re: Indexing PDF

2011-10-04 Thread Paul Libbrecht
full of boxes for me. Héctor, you need another way to reference these! (e.g. a URL) paul Le 4 oct. 2011 à 16:49, Héctor Trujillo a écrit : > Hi all, I'm indexing pdf's files with SolrJ, and most of them work. But with > some files I’ve got problems because they stored estrange characters. I got

Re: Indexing PDF

2011-10-05 Thread Paul Libbrecht
is file with a PDF Reader and I have no problems, and I don’t Know why >> referencing this file with and URL will fix this problem, can you help me? >> >> I'm working with SolrJ, from Java, does some have the same problem with >> SolrJ? >> >> >>

Re: URL Redirect

2011-10-06 Thread Paul Libbrecht
Simone, for such a work you need something external I think. I would use Apache's mod_rewrite which is super flexible for such purposes. Among others it can honour existing URL by either serving them reformulated (e.g. proxied) or by redirecting the browser to use it. Probably something as flexi

Re: capacity planning

2011-10-11 Thread Paul Libbrecht
My experience was 10% of the size. Le 11 oct. 2011 à 15:49, Erik Hatcher a écrit : > (roughly 35% the size, generally).

Re: add thumnail image for search result

2011-10-19 Thread Paul Libbrecht
Hadi, I do not think solr or solrj does this. are your document HTML documents? I would look in the crawler resources but I note that rendering is a rather server-unfriendly task and it bears some security risk if the documents are not fully trusted. In i2geo.net, we finally gave up on automate

Re: Does anybody has experience in Chinese soundex(sounds like) of SOLR?

2011-10-20 Thread Paul Libbrecht
Wouldn't the conversion to a western writing followed by Soundex or Metaphone be the right thing to try? I thought such conversions were mainstream. paul Le 20 oct. 2011 à 12:16, Otis Gospodnetic a écrit : > Hi, > > Wow, interesting question. Can soundex even be applied to a language like

Re: Implement Custom Soundex

2011-10-23 Thread Paul Libbrecht
Momo, if you have the conversion text to tokens then all you need to do is implement a custom analyzer, deploy it inside the solr webapp, then plug it into the schema. Is that the part that is hard? I thought the wiki was helpful there but may some other issue is holding you. One zoology of suc

Re: Out of memory, not during import or updates of the index

2011-11-10 Thread Paul Libbrecht
Steve, do you have any custom code in your Solr? We had out-of-memory errors just because of that, I was using one method to obtain the request which was leaking... had not read javadoc carefully enough. Since then, no leak. What do you do after the OoME? paul Le 9 nov. 2011 à 21:33, Steve F

Re: cache monitoring tools?

2011-12-09 Thread Paul Libbrecht
Allow me to chim in and ask a generic question about monitoring tools for people close to developers: are any of the tools mentioned in this thread actually able to show graphs of loads, e.g. cache counts or CPU load, in parallel to a console log or to an http request log?? I am working on such

Re: VelocityResponseWriter's future

2011-12-09 Thread Paul Libbrecht
Erik, The VelocityResponseWriter has solved a need by me: provide an interface that shows off an amount of the solr capability with queries close to a developer and a UI that you can mail to colleagues. The out-of-the-box-ness is crucial here. Adjust the vm files was also crucial (e.g. to creat

Re: VelocityResponseWriter's future

2011-12-09 Thread Paul Libbrecht
t; new projects and then the UI needs adjustments to be in line with different >> data (as does the schema and solrconfig, but many folks don't adjust those >> either). Point taken that it certainly could be implemented/documented >> better though. >> >> Erik

Re: VelocityResponseWriter's future

2011-12-10 Thread Paul Libbrecht
Le 10 déc. 2011 à 02:56, Erik Hatcher a écrit : >> It's fast and easy but its testing ability is simply... unpredictable. > > I'm not sure I get what you mean by the testability though. Could you > clarify? Taken a bit literally with the VRW, there's this in a test case: Feeling sure of wh

Re: cache monitoring tools?

2011-12-11 Thread Paul Libbrecht
oad, file system stats, processes, > etc. > > http://munin-monitoring.org/ > > Justin > > Paul Libbrecht writes: > >> Allow me to chim in and ask a generic question about monitoring tools >> for people close to developers: are any of the tools mentioned in this &

Re: Restricting HTML search?

2010-08-24 Thread Paul Libbrecht
Wouldn't the usage of the NeckoHTML (as an XML-parser) and XPath be safer? I guess it all depends on the "quality" of the source document. paul Le 25-août-10 à 02:09, Lance Norskog a écrit : I would do this with regular expressions. There is a Pattern Analyzer and a Tokenizer which do regul

Re: Would it be nuts to store a bunch of large attachments (images, videos) in stored but-not-indexed fields

2010-10-30 Thread Paul Libbrecht
I am quite interested by this story, including sample code. Back in Lucene 1.4 and 2.0 times, the reader vs string loading abilities was inconsistently handled and I switched to have one directory with thousands of files for our ActiveMath content storage. It works but fairly badly on smaller ma

Re: user session id / cookie to record search query

2012-11-21 Thread Paul Libbrecht
Record? E.g. output the cookie value of a given name in the log? Provided you use Apache mod_proxy, we do this by a special log-format. paul Le 21 nov. 2012 à 09:50, Romita Saha a écrit : > Hi All, > > Do anyone have an idea how to use user session id / cookie to record > search query from th

Re: Which fields matched?

2012-12-08 Thread Paul Libbrecht
We've used lucene-1999 with some success in ActiveMath to find the language that was matched. paul Le 8 déc. 2012 à 10:09, Mikhail Khludnev a écrit : > Jeff, > explain() algorithm is definitely too slow to be used at search time. There > is an approach which I'm aware of - watch for scorers du

Re: Figure out what value was matched in multi valued field

2013-03-13 Thread Paul Libbrecht
Mephisto, Maybe LUCENE-1999 helps you. We've used it with some success. Otherwise, you're left with highlighting. paul On 13 mars 2013, at 14:11, Jack Krupansky wrote: > Add &debugQuery=true to your query and examine the "explain" section, which > will show the terms/phrases that scored for e

Re: velocity in /srv/www

2013-03-13 Thread Paul Libbrecht
Guy, you'd need a proxy to go from one port (80 for the apache) to port 8983. Apache httpd will not run solr alone. Then the question of where you put the velocity page is "just a matter of configuration". A symbolic link probably. paul On 13 mars 2013, at 15:39, Guy Dobson wrote: > Fellow S

Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread Paul Libbrecht
Nice, Chantal can you indicate there or here what kind of speed for integration tests you've reached with this, from a bare source to a successfully tested application? (e.g. with 100 documents) thanks in advance Paul On 14 mars 2013, at 09:29, Chantal Ackermann wrote: > Hi all, > > > thi

Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread Paul Libbrecht
l the > plugins, importing the data, executing the tests. Well, Maven is certainly > not the fastest tool to start up and get going… > > If you are asking because you want to run rather a lot requests and test > their output - JMeter might be preferrable? > > Hope that was

Re: Writing a french Solr book - Ecrire un livre en français

2012-01-29 Thread Paul Libbrecht
Steve, I am a french speaker myself but have never seen such a thing. Not that I would have looked for it though (being a greedy lucene user since long). A web-search should tell you. And... a search by classical publishers. paul Le 29 janv. 2012 à 16:31, SR a écrit : > My main question is w

Re: language specific fields of "text"

2012-01-29 Thread Paul Libbrecht
(bing is a surprising name for a mailing list about search engine) My guess is that your document upload didn't contain the field text_en. Can it be? Paul bing a écrit : >Hi, all, > >In this thread, I would like to ask some technical questions about how >the >schema is defined to achieve lan

Re: language specific fields of "text"

2012-01-31 Thread Paul Libbrecht
Hello bing, Le 31 janv. 2012 à 04:27, bing a écrit : > I understand your point of missing "text_en" in the document. It is. Not > "text_en" but "text" exists. Unless you use copyField or upload the field as another element, it will not get fed. > But then it arises the question: isn't it dynami

Re: Language specific tokenizer for purpose of multilingual search in single-core solr,

2012-02-14 Thread Paul Libbrecht
only one field element? There should be two or? One for each language. paul Le 14 févr. 2012 à 07:34, bing a écrit : > > Hi, all, > > I want to do multilingual search in single-core solr. That requires to > define language specific tokenizers in scheme.xml. Say for example, I have > two toke

Re: Semantic autocomplete with Solr

2012-02-14 Thread Paul Libbrecht
facetting? paul Le 14 févr. 2012 à 23:10, Octavian Covalschi a écrit : > Hey guys, > > Has anyone done any kind of "smart" autocomplete? Let's say we have a web > store, and we'd like to autocomplete user's searches. So if I'll type in > "jacket" next word that will be suggested should be some

Re: Xml representation of indexed document

2012-03-10 Thread Paul Libbrecht
Chamnap, that'd be a view of the stored fields only (although Luke has some more to extract unstored fields). In my search projects I have an indexer and that component (not DIH) can display an "indexed view" of a document. maybe it helps. paul Le 10 mars 2012 à 08:57, Anupam Bhattacharya a

Re: Xml representation of indexed document

2012-03-10 Thread Paul Libbrecht
e. Any idea? > In your project, which indexer do you use? Previously, I wrote a ruby > script to index, but it took a lot of time. That's why I changed to DIH. > > > Chamnap > > > On Sat, Mar 10, 2012 at 4:41 PM, Paul Libbrecht wrote: > >> Chamnap, >>

Re: Vector based queries

2012-03-11 Thread Paul Libbrecht
Maybe that's exactly it but... given a document with n tokens A, and m tokens B, a query A^n B^m would find what you're looking for or? paul PS I've always viewed queries as linear forms on the vector space and I'd like to see this really mathematically written one day... Le 11 mars 2012 à 07:

Re: Knowing which fields matched a search

2012-03-11 Thread Paul Libbrecht
Russel, there's been a thread on that in the lucene world... it's not really perfect yet. The suggestion to debugQuery gives only, to my experience, the explain monster which is good for developers (only). paul Le 11 mars 2012 à 08:40, William Bell a écrit : > debugQuery tells you. > > On F

Re: List of recommendation engines with solr

2012-03-13 Thread Paul Libbrecht
Just out of curiosity, does Mahout qualify as a recommender-engine, or is it rather a library for it with (potentially open-source) recommenders built on it, with a more specific purpose? The page: https://cwiki.apache.org/MAHOUT/powered-by-mahout.html does not seem to list many open-s

Re: UTF-8 encoding

2012-03-29 Thread Paul Libbrecht
Also, in case you use Apache's mod_proxy, be sure to use the nocanon attribute. (I don't know of an equivalent for mod_rewrite). In general, I tend also to advise also to change the default encoding of the java running the servlets... but I am sure you've done this. Tell us your success or lack

Re: UTF-8 encoding

2012-03-29 Thread Paul Libbrecht
Henri, look velocity.properties. I have there: > input.encoding = UTF-8 Do you also? This is the vm files' encodings. Of course also make sure you edit these files in UTF-8 (using jEdit made it trustable to me). paul Le 30 mars 2012 à 08:49, henri.gour...@laposte.net a écrit : > OK

Re: Trouble handling Unit symbol

2012-03-30 Thread Paul Libbrecht
Rajani, you need to look at the analysis tools of solr-admin, or even luke, to help you. paul Le 30 mars 2012 à 10:01, Rajani Maski a écrit : > Hi, > > We have data having such symbols like : µ > > > Indexed data has -Dose:"0 µL" > Language type - "English" > > > Now , when it is s

Re: Content privacy, search & index

2012-03-31 Thread Paul Libbrecht
Benjamin, I think implementing a QueryHandler that adds the necessary query is the right way to do that. It'd transform a query for "a b" into "+(a b) +(authorizedBit)" (to use the language of the default QueryParser but please not by substring, using the real query objects!). Recalculating th

Re: Content privacy, search & index

2012-04-01 Thread Paul Libbrecht
Hello Benjamin, Le 1 avr. 2012 à 11:48, dbenjamin a écrit : > You lost me :-) > You mean implementing a specific RequestHandler just for my needs ? I think a QueryComponent is enough, it'd extend QueryComponent. It's prepare method reads all the params and calls the ResponseBuilder's setQuery wi

Re: Quantiles in SOLR ???

2012-04-03 Thread Paul Libbrecht
Kashif, my knowledge in probability is limited but I believe the simple similarity function can be seen as a quantile. You can read about it in many places, I believe I read it in the Lucene in Action book. paul Le 3 avr. 2012 à 15:14, Kashif Khan a écrit : > Thanks for sharing your intellect

Re: Lexical analysis tools for German language data

2012-04-12 Thread Paul Libbrecht
Michael, I'm on this list and the lucene list since several years and have not found this yet. It's been one "neglected topics" to my taste. There is a CompoundAnalyzer but it requires the compounds to be dictionary based, as you indicate. I am convinced there's a way to build the de-compound

Re: Lexical analysis tools for German language data

2012-04-12 Thread Paul Libbrecht
Bernd, can you please say a little more? I think this list is ok to contain some description for commercial solutions that satisfy a request formulated on list. Is there any product at BASIS Tech that provides a compound-analyzer with a big dictionary of decomposed compounds in German? If yes,

Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Paul Libbrecht
Le 12 avr. 2012 à 17:46, Michael Ludwig a écrit : >> Some compounds probably should not be decompounded, like "Fahrrad" >> (farhren/Rad). With a dictionary-based stemmer, you might decide to >> avoid decompounding for words in the dictionary. > > Good point. More or less, Fahrrad is generally ab

Re: Can I discover what part of a score is attributable to a subquery?

2012-04-13 Thread Paul Libbrecht
Benson, In mid 2009, I has such a question answered with a nifty score bitwise manipulation, and a little precision loss. For each result I could pick the language of a multilingual match. If interested, I can dig. Paul -- Envoyé de mon téléphone Android avec K-9 Mail. Excusez la brièveté. Bens

Re: Can I discover what part of a score is attributable to a subquery?

2012-04-14 Thread Paul Libbrecht
been taken up, it worked for me in Lucene, 2.4.1. We used this to create an auto-completion popup which selected the right language by flagging the right sub-query that was most matched. paul Le 14 avr. 2012 à 15:34, Benson Margulies a écrit : > yes please > > On Apr 14, 2012, at 2:40

Re: Can I discover what part of a score is attributable to a subquery?

2012-04-14 Thread Paul Libbrecht
enson Margulies a écrit : > On Sat, Apr 14, 2012 at 12:37 PM, Paul Libbrecht wrote: >> Benson, >> >> it was in the Lucene world in May 2010: >> >> http://mail-archives.apache.org/mod_mbox/lucene-java-user/201005.mbox/%3c469705.48901...@web29016.mail.ird.yahoo.

Re: Deciding whether to stem at query time

2012-04-24 Thread Paul Libbrecht
Le 24 avr. 2012 à 17:16, Otis Gospodnetic a écrit : > This would not necessarily increase the size of your index that much - you > don't to store both fields, just 1 of them if you really need it for > highlighting or displaying. If not, just index. I second this. The query expansion process i

Re: Solr for routing a webapp

2012-04-26 Thread Paul Libbrecht
Have you tried using mod_rewrite for this? paul Le 26 avr. 2012 à 15:16, Björn Zapadlo a écrit : > Hello, > > I'm thinking about using a Solr index for routing a webapp. > > I have pregenerated base urls in my index. E.g. > /foo/bar1 > /foo/bar2 > /foo/bar3 > /foo/bar4 > /bar/foo1 > /bar/foo2

Re: Solr for routing a webapp

2012-04-26 Thread Paul Libbrecht
Or write your own query component mapping /solr/* in the web.xml, exposing the request by a thread-local through a filter, and reading this setting the appropriate query parameters... Performance-wise, this seems quite reasonable I think. paul Le 26 avr. 2012 à 16:58, Paul Libbrecht a écrit

Re: Removing old documents

2012-05-01 Thread Paul Libbrecht
I've been surprised to see Firefox cache even after empty-cache was ordered for JSOn results... this is quite annoying but I have get accustomed to it by doing the following when I need to debug: add a random parameter extra. But only when debugging! Using wget or curl showed me that the browser

Re: Removing old documents

2012-05-01 Thread Paul Libbrecht
With which client? paul Le 2 mai 2012 à 01:29, alx...@aim.com a écrit : > all caching is disabled and I restarted jetty. The same results.

Re: Solritas in production

2012-05-07 Thread Paul Libbrecht
I do not share this reasoning at all. Of course a new UI would need to be developed, solr/itas is just an example. But that precisely is the interest of solr/itas, a view system that is easy to tune. I do not feel, at all, that it means that it is not production ready. There are a zillion ways t

Re: Solritas in production

2012-05-08 Thread Paul Libbrecht
Le 7 mai 2012 à 13:30, Marcelo Carvalho Fernandes a écrit : > Anything else? If fearing DoS attacks by too large queries (e.g. if having millions of documents), consider writing a query-component that can limit the queries. I believe that there's nothing else. paul

anticipating the indexing completion

2012-05-09 Thread Paul Libbrecht
Hello SOLR experts, I have my own indexing web-application which talks in XML to SOLR. It works wonderfully well. The queue is displayed in the indexer, so that experts can have a track that it went well into the index. However, i see no way currently to display that solr's searcher includes t

Re: anticipating the indexing completion

2012-05-09 Thread Paul Libbrecht
until the next autocommit" and then you could add that to the warmup time > from previous warming and estimate. > > Otis > > Performance Monitoring for Solr / ElasticSearch / HBase - > http://sematext.com/spm > > > >> &g

  1   2   3   >