date:20100209

RE: Indexing / querying multiple data types

2010-02-09 Thread stefan.maric

Sven In my data-config.xml I have the following In my schema.xml I have And in my solrconfig.xml I have data-config.xml dismax

Unsubscribe from mailing list

2010-02-09 Thread Abin Mathew

Please unsubscribe me from Mailing list

Re: Dynamic fields with more than 100 fields inside

2010-02-09 Thread Xavier Schepler

Shalin Shekhar Mangar a écrit : On Mon, Feb 8, 2010 at 9:47 PM, Xavier Schepler < xavier.schep...@sciences-po.fr> wrote: Hey, I'm thinking about using dynamic fields. I need one or more user specific field in my schema, for example, "concept_user_*", and I will have maybe more than 200 use

Posting pdf file and posting from remote

2010-02-09 Thread alendo

I understand that tika is able to index pdf content: its true? I tried to post a pdf from local and I've seen in the solr/admin schema browser another document, but when I search only the document id is available, the documents doesn't seem indexed. Do I need other products to index pdf content?

Re: Dynamic fields with more than 100 fields inside

2010-02-09 Thread Shalin Shekhar Mangar

On Tue, Feb 9, 2010 at 2:43 PM, Xavier Schepler < xavier.schep...@sciences-po.fr> wrote: > Shalin Shekhar Mangar a écrit : > > On Mon, Feb 8, 2010 at 9:47 PM, Xavier Schepler < >> xavier.schep...@sciences-po.fr> wrote: >> >> >> >>> Hey, >>> >>> I'm thinking about using dynamic fields. >>> >>> I n

Re: Dynamic fields with more than 100 fields inside

2010-02-09 Thread Xavier Schepler

Shalin Shekhar Mangar a écrit : On Tue, Feb 9, 2010 at 2:43 PM, Xavier Schepler < xavier.schep...@sciences-po.fr> wrote: Shalin Shekhar Mangar a écrit : On Mon, Feb 8, 2010 at 9:47 PM, Xavier Schepler < xavier.schep...@sciences-po.fr> wrote: Hey, I'm thinking about using

DIH: delta-import not working

2010-02-09 Thread Jorg Heymans

Hi, I am having problems getting the delta-import to work for my schema. Following what i have found in the list, jira and the wiki below configuration should just work but it doesn't. The sql generated in the deltaquery is correct, the times

Re: unloading a solr core doesn't free any memory

2010-02-09 Thread Tim Terlegård

I don't use any garbage collection parameters. /Tim 2010/2/8 Simon Rosenthal : > What Garbage Collection parameters is the JVM using ? the memory will not > always be freed immediately after an event like unloading a core or starting > a new searcher. > > 2010/2/8 Tim Terlegård > >> To me it d

Re: Posting pdf file and posting from remote

2010-02-09 Thread alendo

Ok I'm going ahead (may be:). I tried another curl command to send the file from remote: http://mysolr:/solr/update/extract?literal.id=8514&stream.file=files/attach-8514.pdf&stream.contentType=application/pdf and the behaviour has been changed: now I get an error in solr log file: HTTP St

Re: unloading a solr core doesn't free any memory

2010-02-09 Thread Tim Terlegård

If I unload the core and then click "Perform GC" in jconsole nothing happens. The 8 GB RAM is still used. If I load the core again and then run the query with the sort fields, then jconsole shows that the memory usage immediately drops to 1 GB and then rises to 8 GB again as it caches the stuff.

Re: TermInfosReader.get ArrayIndexOutOfBoundsException

2010-02-09 Thread Michael McCandless

Which version of Solr/Lucene are you using? Can you run Lucene's CheckIndex tool (java -ea:org.apache.lucene org.apache.lucene.index.CheckIndex /path/to/index) and then post the output? Have you altered any of IndexWriter's defaults (via solrconfig.xml)? Eg the termIndexInterval? Mike On Mon, F

joining two field for query

2010-02-09 Thread Ranveer Kumar

Hi all, I need logic in solr to join two field in query; I indexed two field : id and body(text type). 5 rows are indexed: id=1 : text= nokia samsung id=2 : text= sony vaio nokia samsung id=3 : text= vaio nokia etc.. I am searching by "q=id:1" returning result perfectly, returning "n

Re: DIH: delta-import not working

2010-02-09 Thread Noble Paul നോബിള്‍ नोब्ळ्

try this deltaImportQuery="select id, bytes from attachment where application = 'MYAPP' and id = '${dataimporter.delta.id}'" be aware that the names are case sensitive . if the id comes as 'ID' this will not work On Tue, Feb 9, 2010 at 3:15 PM, Jorg Heymans wrote: > Hi, > > I am having probl

Re: Call URL, simply parse the results using SolrJ

2010-02-09 Thread Noble Paul നോബിള്‍ नोब्ळ्

you can also try URL urlo = new URL(url);// ensure that the url has wt=javabin in that NamedList namedList = new JavaBinCodec().unmarshal(urlo.openConnection().getInputStream()); QueryResponse response = new QueryResponse(namedList, null); On Mon, Feb 8, 2010 at 11:49 PM, Jason Rutherglen wrote

Re: joining two field for query

2010-02-09 Thread Ahmet Arslan

> I am searching by "nokia" and resulting (listing) 1,2,3 > field with short > description. > There is link on search list(like google), by clicking on > link performing > new search (opening doc from index), for this search > > I want to join two fields: > id:1 + queryString ("nokia samsung") t

Replication and querying

2010-02-09 Thread Julian Hille

Hi, id like to know if its possible to have a solr Server with a schema and lets say 10 fields indexed. I know want to replicate this whole index to another solr server which has a slightly different schema. There are additional 6 fields these fields change the sort order for a product which ba

Re: joining two field for query (Solved)

2010-02-09 Thread Ranveer Kumar

Hi Ahmet, Thank you very much.. my problem solved.. with regards On Tue, Feb 9, 2010 at 5:38 PM, Ahmet Arslan wrote: > > > I am searching by "nokia" and resulting (listing) 1,2,3 > > field with short > > description. > > There is link on search list(like google), by clicking on > > link perfo

Autosuggest and highlighting

2010-02-09 Thread gwk

Hi, I'm trying to improve the search box on our website by adding an autosuggest field. The dataset is a set of properties in the world (mostly europe) and the searchbox is intended to be filled with a country-, region- or city name. To do this I've created a separate, simple core with one do

Re: Solr usage with Auctions/Classifieds?

2010-02-09 Thread Jan Høydahl / Cominvent

With the new sort by function in 1.5 (http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function), will it now be possible to include the ExternalFileField value in the sort formula? If so, we could sort on last bid price or last bid time without updating the document itself. However, to displ

Distributed search and haproxy and connection build up

2010-02-09 Thread Ian Connor

I have been using distributed search with haproxy but noticed that I am suffering a little from tcp connections building up waiting for the OS level closing/time out: netstat -a ... tcp6 1 0 10.0.16.170%34654:53789 10.0.16.181%363574:8893 CLOSE_WAIT tcp6 1 0 10.0.16.170%34654

Re: Autosuggest and highlighting

2010-02-09 Thread Ahmet Arslan

> I'm trying to improve the search box on our website by > adding an autosuggest field. The dataset is a set of > properties in the world (mostly europe) and the searchbox is > intended to be filled with a country-, region- or city name. > To do this I've created a separate, simple core with one >

Re: How to send web pages(urls) to solr cell via solrj?

2010-02-09 Thread Jan Høydahl / Cominvent

Hi, I did not try this, but could you not read the URL client side and pass it to SolrJ as a ContentStream? ContentStream urlStream = ContentStreamBase.URLStream("http://my.site/file.html";); req.addContentStream(urlStream); -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com

Re: Autosuggest and highlighting

2010-02-09 Thread gwk

On 2/9/2010 2:57 PM, Ahmet Arslan wrote: I'm trying to improve the search box on our website by adding an autosuggest field. The dataset is a set of properties in the world (mostly europe) and the searchbox is intended to be filled with a country-, region- or city name. To do this I've created a

Re: Faceting

2010-02-09 Thread Jan Høydahl / Cominvent

NOTE: Please start a new email thread for a new topic (See http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking) Your strategy could work. You might want to look into dedicated entity extraction frameworks like http://opennlp.sourceforge.net/ http://nlp.stanford.edu/software/CRF-NER.shtml

Re: DIH: delta-import not working

2010-02-09 Thread Jorg Heymans

indeed that made it work. Looking back at the documentation, it's all there but one needs to read every single line with care :-) 2010/2/9 Noble Paul നോബിള്‍ नोब्ळ् > try this > > deltaImportQuery="select id, bytes from attachment where application = > 'MYAPP' and id = '${dataimporter.delta.id}

Re: Is it posible to exclude results from other languages?

2010-02-09 Thread Jan Høydahl / Cominvent

Much more efficient to tag documents with language at index time. Look for language identification tools such as http://www.sematext.com/products/language-identifier/index.html or http://ngramj.sourceforge.net/ or http://lucene.apache.org/nutch/apidocs-1.0/org/apache/nutch/analysis/lang/Languag

Re: joining two field for query

2010-02-09 Thread Jan Høydahl / Cominvent

You may also want to play with other highlighting parameters to select how much text to do highlighting on, how many fragments etc. See http://wiki.apache.org/solr/HighlightingParameters -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 9. feb. 2010, at 13.08, Ahmet Arslan

Re: Replication and querying

2010-02-09 Thread Jan Høydahl / Cominvent

Hi, Index replication in Solr makes an exact copy of the original index. Is it not possible to add the 6 extra fields to both instances? An alternative to replication is to feed two independent Solr instances -> full control :) Please elaborate on your specific use case if this is not useful answ

Re: unloading a solr core doesn't free any memory

2010-02-09 Thread Jason Rutherglen

Tim, The GC just automagically works right? :) There's been issues around thread local in Lucene. The main code for core management is CoreContainer, which I believe is fairly easy to digest. If there's an issue you may find it there. Jason 2010/2/9 Tim Terlegård : > If I unload the core and

Question on Tokenizing email address

2010-02-09 Thread Abhishek Srivastava

Hello Everyone, I have a field in my solr schema which stores emails. The way I want the emails to be tokenized is like this. if the email address is abc@alpha-xyz.com User should be able to search on 1. abc@alpha-xyz.com (whole address) 2. abc 3. def 4. alpha-xyz Which tokenizer should

Re: TermInfosReader.get ArrayIndexOutOfBoundsException

2010-02-09 Thread Tom Burton-West

Thanks Lance and Michael, We are running Solr 1.3.0.2009.09.03.11.14.39 (Complete version info from Solr admin panel appended below) I tried running CheckIndex (with the -ea: switch ) on one of the shards. CheckIndex also produced an ArrayIndexOutOfBoundsException on the larger segment contai

RE: HTTP caching and distributed search

2010-02-09 Thread Charlie Jackson

I tried your suggestion, Hoss, but committing to the new coordinator core doesn't change the indexVersion and therefore the ETag value isn't changed. I opened a new JIRA issue for this http://issues.apache.org/jira/browse/SOLR-1765 Thanks, Charlie -Original Message- From: Chris Hostett

Re: TermInfosReader.get ArrayIndexOutOfBoundsException

2010-02-09 Thread Michael McCandless

Yes, the term count reported by CheckIndex is the total number of unique terms. It indeed looks like you are exceeding the unique term count limit -- 16777214 * 128 (= the default term index interval) is 2147483392 which is mighty close to max/min 32 bit int value. This makes sense, because Check

Re: TermInfosReader.get ArrayIndexOutOfBoundsException

2010-02-09 Thread Michael McCandless

I opened a Lucene issue w/ patch to try: https://issues.apache.org/jira/browse/LUCENE-2257 Tom let me know if you're able to test this... thanks! Mike On Tue, Feb 9, 2010 at 2:09 PM, Michael McCandless wrote: > Yes, the term count reported by CheckIndex is the total number of unique > term

Re: TermInfosReader.get ArrayIndexOutOfBoundsException

2010-02-09 Thread Tom Burton-West

Thanks Michael, I'm not sure I understand. CheckIndex reported a negative number: -16777214. But in any case we can certainly try running CheckIndex from a patched lucene We could also run a patched lucene on our dev server. Tom Yes, the term count reported by CheckIndex is the total

Re: TermInfosReader.get ArrayIndexOutOfBoundsException

2010-02-09 Thread Michael McCandless

I attached a patch to the issue that may fix it. Maybe start by running CheckIndex first? Mike On Tue, Feb 9, 2010 at 2:56 PM, Tom Burton-West wrote: > > Thanks Michael, > > I'm not sure I understand. CheckIndex reported a negative number: > -16777214. > > But in any case we can certainly try

Re: TermInfosReader.get ArrayIndexOutOfBoundsException

2010-02-09 Thread Michael McCandless

On Tue, Feb 9, 2010 at 2:56 PM, Tom Burton-West wrote: > I'm not sure I understand. CheckIndex reported a negative number: > -16777214. Right, we are overflowing the positive ints, which wraps around to the smallest int (-2.1 billion), and then dividing by 128 = ~ -1677214. Lucene has an array

Solr/Drupal Integration - Query Question

2010-02-09 Thread jaybytez

I know this is not Drupal, but thought this question maybe more around the Solr query. For instance, I pulled down LucidImaginations Solr install, just like the apache solr install and ran the example solr and loaded the documents from the exampledocs. I can go to: http://localhost:8983/solr/ad

Re: Question on Tokenizing email address

2010-02-09 Thread Jan Høydahl / Cominvent

Hi, To match 1, 2, 3, 4 below you could use a fieldtype based on TextField, with just a simple WordDelimiterFactory. However, this would also match abc-def, def.alpha, xyz-com and a...@def, because all punctuation is treated the same. To avoid this, you could do some custom handling of "-", "."

How to add SpellCheckResponse to Solritas?

2010-02-09 Thread Jan Høydahl / Cominvent

Hi, I'm using the /itas requestHandler, and would like to add spell-check suggestions to the output. I'm having spell-check configured and working in the XML response writer, but nothing is output in Velocity. Debugging the JSON $response object, I cannot find any representation of spellcheck r

Bigram term vectors and weights possible with Solr?

2010-02-09 Thread Mike Hughes

Hello, One of the commercial search platforms I work with has the concept of 'document vectors', which are 1-gram and 2-gram phrases and their associated tf/idf weights on a 0-1 scale, i.e. ["banana pie", 0.99] means banana pie is very relevant for this document. During the ingest/indexing proces

Re: Bigram term vectors and weights possible with Solr?

2010-02-09 Thread Ahmet Arslan

> I've been looking at the Solr TermVectorComponent > (http://wiki.apache.org/solr/TermVectorComponent) and it > seems to have > something similar to this, but it looks to me like this is > a component > that is processed at query time (?) and is limited to > 1-gram terms. If you use it can give

"after flush: fdx size mismatch" on query durring writes

2010-02-09 Thread Acadaca

We are using Solr 1.4 in a multi-core setup with replication. Whenever we write to the master we get the following exception: java.lang.RuntimeException: after flush: fdx size mismatch: 1285 docs vs 0 length in bytes of _gqg.fdx file exists?=false at org.apache.lucene.index.StoredFieldsWriter.cl

Re: Solr usage with Auctions/Classifieds?

2010-02-09 Thread Lance Norskog

The class was added in 2007 and hasn't changed. I don't know if anyone uses it. Presumably sort-by-function will use it. On Tue, Feb 9, 2010 at 5:59 AM, Jan Høydahl / Cominvent wrote: > With the new sort by function in 1.5 > (http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function), will it

Re: Bigram term vectors and weights possible with Solr?

2010-02-09 Thread Mike Hughes

Thank you Ahmet, this is exactly what I was looking for. Looks like the shingle filter can produce 3+-gram terms as well, that's great. I'm going to try this with both western and CJK language tokenizers and see how it turns out. On Tue, Feb 9, 2010 at 5:07 PM, Ahmet Arslan wrote: >> I've been l

Copying dynamic fields into default text field messing up fieldNorm?

2010-02-09 Thread Yu-Shan Fung

Hi All, I'm trying to create an index of documents, where for each document, I am trying to associate with it a set of related keywords, each with individual boost values that I compute externally. eg: Document Title: Democrats related keywords: liberal: 4.0 politics: 1.5 obama: 2.0

Re: Indexing / querying multiple data types

2010-02-09 Thread Lance Norskog

A couple of minor problems: The qt parameter (Que Tee) selects the parser for the q (Q for query) parameter. I think you mean 'qf': http://wiki.apache.org/solr/DisMaxRequestHandler#qf_.28Query_Fields.29 Another problems with atomID, atomId, atomid: Solr field names are case-sensitive. I don't kn

Re: Posting pdf file and posting from remote

2010-02-09 Thread Lance Norskog

stream.file= means read a local file from the server that solr runs on. It has to be a complete path that works from that server. To load the file over HTTP you have to use @filename to have curl open it. This path has to work from the program you run curl on, and relative paths work. Also, tika d

Re: Distributed search and haproxy and connection build up

2010-02-09 Thread Lance Norskog

This goes through the Apache Commons HTTP client library: http://hc.apache.org/httpclient-3.x/ We used 'balance' at another project and did not have any problems. On Tue, Feb 9, 2010 at 5:54 AM, Ian Connor wrote: > I have been using distributed search with haproxy but noticed that I am > sufferi

analysing wild carded terms

2010-02-09 Thread Joe Calderon

hello *, quick question, what would i have to change in the query parser to allow wildcarded terms to go through text analysis?

Re: Autosuggest and highlighting

2010-02-09 Thread Lance Norskog

To select the whole string, I think you want hl.fragmenter=regex and to create a regex pattern for your entire strings: http://www.lucidimagination.com/search/document/CDRG_ch07_7.9?q=highlighter+multi-valued This will let you select the entire string field. But I don't know how to avoid the non-

Re: Is it posible to exclude results from other languages?

2010-02-09 Thread Lance Norskog

That's what I was going to look up :) The nutch thing works reasonably well. It comes with a training database from various languages. It had some UTF-8 problems in the files. The trick here is to come up with a balanced volume of text for all languages so that one language's patterns do not overw

Re: Solr/Drupal Integration - Query Question

2010-02-09 Thread Lance Norskog

The admin/form.jsp is supposed to prepopulate fl= with '*,score' which means bring back all fields and the calculated relevance score. This is the Drupal search, decoded. I changed the %2B to + signs for readability. Have a look at the filter query fq= and the facet date range. Also, in Solr 1.4

Re: "after flush: fdx size mismatch" on query durring writes

2010-02-09 Thread Lance Norskog

We need more information. How big is the index in disk space? How many documents? How many fields? What's the schema? What OS? What Java version? Do you run this on a local hard disk or is it over an NFS mount? Does this software commit before shutting down? If you run with asserts on do you get

Re: Question on Tokenizing email address

2010-02-09 Thread abhishes

Thank you! it works very well. I think that the field type suggested by you will index words like DOT, AT, com also In order to prevent these words from getting indexed, I have changed the field type to

Re: Is it posible to exclude results from other languages?

2010-02-09 Thread Shalin Shekhar Mangar

On Wed, Feb 10, 2010 at 10:09 AM, Lance Norskog wrote: > > Thanks for the pointer to ngramj (LGPL license), which then leads to > another contender, http://tcatng.sourceforge.net/ (BSD license). The > latter would make a great DIH Transformer that could go into contrib/ > (hint hint). > > SOLR-17

56 matches

Mail list logo