Fwd: Using MLT Handler to find similar documents but also filter similar documents by a keyword.

2012-03-10 Thread Ravish Bhagdev
I will appreciate any comments or help on this. Thanks. Rav -- Forwarded message -- From: Ravish Bhagdev Date: Fri, Mar 2, 2012 at 12:12 AM Subject: Using MLT Handler to find similar documents but also filter similar documents by a keyword. To: solr-user@lucene.apache.org Hi,

Re: Xml representation of indexed document

2012-03-10 Thread Paul Libbrecht
Chamnap, that'd be a view of the stored fields only (although Luke has some more to extract unstored fields). In my search projects I have an indexer and that component (not DIH) can display an "indexed view" of a document. maybe it helps. paul Le 10 mars 2012 à 08:57, Anupam Bhattacharya a

how to ignore indexing of duplicated documents?

2012-03-10 Thread nagarjuna
Hi all, i am new to solr ...i would like to know how to avoid indexing of duplicate documents. i have one table in database with one column and that has the keywords which are frequently repeated when i tried to index it is indexing all the terms in the database. i would like to ignore the index

Re: Xml representation of indexed document

2012-03-10 Thread Chamnap Chhorn
Thanks Anupam and Paul. Yes, it can't display unstored fields. I can't find the way to extract unstored fields in Luke. Any idea? In your project, which indexer do you use? Previously, I wrote a ruby script to index, but it took a lot of time. That's why I changed to DIH. Chamnap On Sat, Mar 1

Accessing other entities from DIH

2012-03-10 Thread Chamnap Chhorn
Hi all, I'm using DIH solr 3.5 to import data from mysql. In my document, I have some fields: name, category, text_spell, ... text_spell is a multi-valued field which combines from name and category (category is a multi-value field as well). In this case, I would use ScriptTra

Re: Accessing other entities from DIH

2012-03-10 Thread Mikhail Khludnev
Hello, First of all you can have an access to the context, where the parent entity fields can be obtained from (following your link): The semantics of execution is same as that of a java transformer. The method can have two arguments as in 'transformRow(Map , Context context) in the abstract clas

Re: Xml representation of indexed document

2012-03-10 Thread Mikhail Khludnev
Hello, DIH has a cute interactive ui with debug/verbose features. Have you checked them? On Sat, Mar 10, 2012 at 10:57 AM, Chamnap Chhorn wrote: > Hi all, > > I'm doing data import using DIH in solr 3.5. I'm curious to know whether it > is see the xml representation of indexed data from the brow

Re: Xml representation of indexed document

2012-03-10 Thread Paul Libbrecht
I made my own indexed doc representation using JDOM then represented that web-based. paul Le 10 mars 2012 à 12:08, Chamnap Chhorn a écrit : > Thanks Anupam and Paul. > > Yes, it can't display unstored fields. I can't find the way to extract > unstored fields in Luke. Any idea? > In your proje

Faster Solr Indexing

2012-03-10 Thread Peyman Faratin
Hi I am trying to index 12MM docs faster than is currently happening in Solr (using solrj). We have identified solr's add method as the bottleneck (and not commit - which is tuned ok through mergeFactor and maxRamBufferSize and jvm ram). Adding 1000 docs is taking approximately 25 seconds. We

Re: Accessing other entities from DIH

2012-03-10 Thread Chamnap Chhorn
Thanks Mikhail. Yeah, in this case CopyField is better. I can combine multiple fields into a new field, right? Something like this: Anyway, I might need to access the child entity and parent entity. Can you provide me some examples on how to use context? I'm not a java developer, it's a little

Re: Accessing other entities from DIH

2012-03-10 Thread Mikhail Khludnev
Chamnap, Context's way is kind of experimental as-is approach, and the only way to explore it is use debugger or be ready to debug JavaScript manually. It is not documented well. Common approach is copyfield. With Best Wishes. On Sat, Mar 10, 2012 at 8:24 PM, Chamnap Chhorn wrote: > Thanks Mikh

Vector based queries

2012-03-10 Thread Pat Ferrel
I have a case where I'd like to get documents which most closely match a particular vector. The RowSimilarityJob of Mahout is ideal for precalculating similarity between existing documents but in my case the query is constructed at run time. So the UI constructs a vector to be used as a query.

Re: Xml representation of indexed document

2012-03-10 Thread Chamnap Chhorn
Mikhail, DIH interactive ui doesn't look good to me because I can't see the xml of indexed documents. I need to see to make sure I'm doing right. How do you make sure you're doing right by using DIH interactive ui? On Sat, Mar 10, 2012 at 7:14 PM, Mikhail Khludnev < mkhlud...@griddynamics.com> wr

How to index a single big file?

2012-03-10 Thread neosky
Hello, I have a great challenge here. I have a big file(1.2G) with more than 200 million records need to index. It might more than 9 G file with more than 1000 million record later. One record contains 3 fields. I am quite newer for solr and lucene, so I have some questions: 1. It seems that solr

Re: How to index a single big file?

2012-03-10 Thread Grant Ingersoll
On Mar 10, 2012, at 1:52 PM, neosky wrote: > Hello, I have a great challenge here. I have a big file(1.2G) with more than > 200 million records need to index. It might more than 9 G file with more > than 1000 million record later. > One record contains 3 fields. I am quite newer for solr and luc

3 Way Solr Join . . ?

2012-03-10 Thread Angelina Bola
Does "Solr" support a 3-way join? i.e. http://wiki.apache.org/solr/Join (I have the 2-way join working) For example, I am pulling 3 different tables from a RDBMS into one Solr core: Table#1: Customers (parent table) Table#2: Addresses (child table with foreign key to customers) Tabl

Re: 3 Way Solr Join . . ?

2012-03-10 Thread Walter Underwood
Fields can be multi-valued. Put multiple phone numbers in a field and match all of them. wunder On Mar 10, 2012, at 4:58 PM, Angelina Bola wrote: > Does "Solr" support a 3-way join? i.e. > http://wiki.apache.org/solr/Join (I have the 2-way join working) > > For example, I am pulling 3 differe

Re: Stemmer Question

2012-03-10 Thread Jamie Johnson
Barring the horrible name I am wondering if folks would be interested in having something like this as an alternative to the standard kstemmer. This is largely based on the SynonymFilter except it builds tokens using the kstemmer and the original input. I've created a JIRA for this to start discu

More explanation on row in DIH

2012-03-10 Thread Chamnap Chhorn
Hi all, Anyone please help explain me about a row in DIH. Let's say, a listing can have multiple keyphrase_assets. A keyphrase_asset is a comma-seperated value ("hotel,bank,..."). I need to index and split by comma into a multi-valued keyphrase field. function fKeyphrasePosition(row) { } Theref

Re: Vector based queries

2012-03-10 Thread Lance Norskog
Look at the MoreLikeThis feature in Lucene. I believe it does roughly what you describe. On Sat, Mar 10, 2012 at 9:58 AM, Pat Ferrel wrote: > I have a case where I'd like to get documents which most closely match a > particular vector. The RowSimilarityJob of Mahout is ideal for > precalculating

Re: 3 Way Solr Join . . ?

2012-03-10 Thread William Bell
Yeah I am a bit afraid when people want to use the join() feature. To get good performance you really need to try to stick to the recommendation of denormalizing your database into multiValued search fields. You can also use external fields, or store formatted info into a String field in json or x

Re: does solr have a mechanism for intercepting requests - before they are handed off to a request handler

2012-03-10 Thread William Bell
Why not wrap the call into a service and then call the right handler? On Fri, Mar 9, 2012 at 10:11 AM, geeky2 wrote: > hello all, > > does solr have a mechanism that could intercept a request (before it is > handed off to a request handler). > > the intent (from the business) is to send in a gene

Re: Knowing which fields matched a search

2012-03-10 Thread William Bell
debugQuery tells you. On Fri, Mar 9, 2012 at 1:05 PM, Russell Black wrote: > When searching across multiple fields, is there a way to identify which > field(s) resulted in a match without using highlighting or stored fields? -- Bill Bell billnb...@gmail.com cell 720-256-8076

Re: Lucene vs Solr design decision

2012-03-10 Thread William Bell
Great answer Robert. On Fri, Mar 9, 2012 at 12:06 PM, Robert Stewart wrote: > Split up index into say 100 cores, and then route each search to a specific > core by some mod operator on the user id: > > core_number = userid % num_cores > > core_name = "core"+core_number > > That way each index co