date:20090124

Re: DataImport TXT file entity processor

2009-01-24 Thread Noble Paul നോബിള്‍ नोब्ळ्

an EntityProcessor looks right to me. It may help us add more
attributes if needed.

PlainTextEntityProcessor looks like a good name. It can also be used
to read html etc.
--Noble

On Sat, Jan 24, 2009 at 12:37 PM, Shalin Shekhar Mangar
 wrote:
> On Sat, Jan 24, 2009 at 5:56 AM, Nathan Adams  wrote:
>
>> Is there a way to us Data Import Handler to index non-XML (i.e. simple
>> text) files (either via HTTP or FileSystem)?  I need to put the entire
>> contents of a text file into a single field of a document and the other
>> fields are being pulled out of Oracle...
>
>
> Not yet. But I think it will be nice to have. Can you open an issue in Jira?
>
> I think importing from HTTP was something another user had asked for
> recently. How do you get the url/path of this text file? That would help
> decide if we need a Transformer or EntityProcessor for these tasks.
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
--Noble Paul

Re: Should I extend DIH to handle POST too?

2009-01-24 Thread Noble Paul നോബിള്‍ नोब्ळ्

That does not look like a great option. DIH looks like an overkill for
this usecase.


You can write a simple UpdateHandler to do that .
All that you need to do is to extent ContentStreamHandlerBase and
register it as an UpdateHandler


On Sat, Jan 24, 2009 at 12:34 PM, Shalin Shekhar Mangar
 wrote:
> There's another option. Using DIH with Solrj. Take a look at:
>
> https://issues.apache.org/jira/browse/SOLR-853
>
> There's a patch there but it hasn't been updated to trunk. A contribution
> would be most welcome.
>
> On Sat, Jan 24, 2009 at 3:11 AM, Gunaranjan Chandraraju <
> chandrar...@apple.com> wrote:
>
>> Hi
>> I had earlier described my requirement of needing to 'post XMLs as-is' to
>> SOLR and have it handled just as the DIH would do on import using the
>> mapping in data-config.xml.  I got multiple answers for the 'post approach'
>> - the top two being
>>
>> - Use SOLR CELL
>> - Use SOLRJ
>>
>> In general I would like to keep all the 'data conversion' inside the SOLR
>> powered search system rather than having clients do the XSL and transforming
>> the XML before sending them (CELL approach).
>>
>> My question is? How should I design this
>>  - Tomcat Servlet that provides this 'post' endpoint.  Accepts the XML over
>> HTTP, transforms it and calls SOLRJ to update.  This is the same TOMCAT that
>> houses SOLR.
>>  - SOLR Handler (Is this the right way?)
>> - Take this a step further and implement it as an extension to DIH - a
>> handler that will refer to DIH data-config xml and use the same
>> transformation.  This way I can invoke an import for 'batched files' or do a
>> 'post 'for the same XML with the same data-config mapping being applied.
>>  Maybe it can be a separate handler that just refers to the same
>> data-config.xml and not necessarily bundled with DIH handler code.
>>
>> Looking for some advise.  If the DIH extension is the way to go then I
>> would be happy to extend it and contribute that back to SOLR.
>>
>> Regards,
>> Guna
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
--Noble Paul

Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-24 Thread Noble Paul നോബിള്‍ नोब्ळ्

hi Jaco,
We  owe you a bing THANK YOU.

We were planning to roll out this feature into production in the next
week or so. Our internal testing could not find this out.



--Noble


On Fri, Jan 23, 2009 at 6:36 PM, Jaco  wrote:
> Hi,
>
> I have tested this as well, looking fine! Both issues are indeed fixed, and
> the index directory of the slaves gets cleaned up nicely. I will apply the
> changes to all systems I've got running and report back in this thread in
> case any issues are found.
>
> Thanks for the very fast help! I usually need much, much more patience with
> commercial software vendors..
>
> Cheers,
>
> Jaco.

>
>
> 2009/1/23 Noble Paul നോബിള്‍ नोब्ळ् 
>
>> I have opened an issue to track this
>> https://issues.apache.org/jira/browse/SOLR-978
>>
>> On Fri, Jan 23, 2009 at 5:22 PM, Noble Paul നോബിള്‍  नोब्ळ्
>>  wrote:
>> > I tested with the patch
>> > it has solved both the issues
>> >
>> > On Fri, Jan 23, 2009 at 5:00 PM, Shalin Shekhar Mangar
>> >  wrote:
>> >>
>> >>
>> >> On Fri, Jan 23, 2009 at 2:12 PM, Jaco  wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I applied the patch and did some more tests - also adding some
>> LOG.info()
>> >>> calls in delTree to see if it actually gets invoked (LOG.info("START:
>> >>> delTree: "+dir.getName()); at the start of that method). I don't see
>> any
>> >>> entries of this showing up in the log file at all, so it looks like
>> >>> delTree
>> >>> doesn't get invoked at all.
>> >>>
>> >>> To be sure, explaining the issue to prevent misunderstanding:
>> >>> - The number of files in the index directory on the slave keeps
>> increasing
>> >>> (in my very small test core, there are now 128 files in the slave's
>> index
>> >>> directory, and only 73 files in the master's index directory)
>> >>> - The directories index.x are still there after replication, but
>> they
>> >>> are empty
>> >>>
>> >>> Are there any other things I can do check, or more info that I can
>> provide
>> >>> to help fix this?
>> >>
>> >> The problem is that when we do a commit on the slave after replication
>> is
>> >> done. The commit does not re-open the IndexWriter. Therefore, the
>> deletion
>> >> policy does not take affect and older files are left as is. This can
>> keep on
>> >> building up. The only solution is to re-open the index writer.
>> >>
>> >> I think the attached patch can solve this problem. Can you try this and
>> let
>> >> us know? Thank you for your patience.
>> >>
>> >> --
>> >> Regards,
>> >> Shalin Shekhar Mangar.
>> >>
>> >
>> >
>> >
>> > --
>> > --Noble Paul
>> >
>>
>>
>>
>> --
>> --Noble Paul
>>
>



-- 
--Noble Paul

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-24 Thread Fergus McMenemie

Hello,

I am also a newbie and was wanting to do almost the exact same thing.
I was planning on doing the equivalent of:-




  
 
  
***change**
 
  
  
  
  


  



ID is no longer unique within Solr, There would be multiple "documents"
with a given ID; one for each address. You can then search on ID and get 
the three addresses, you can also search on an address more sensibly.

I have not been able to try this yet as other issues are still to be
dealt with.

Comments?

>Hi
>I may be completely off on this being new to SOLR but I am not sure  
>how to index related groups of fields in a document and preserver  
>their 'grouping'.   I  would appreciate any help on this.Detailed  
>description of the problem below.
>
>I am trying to index an entity that can have multiple occurrences in  
>the same document - e.g. Address.  The address could be Shipping,  
>Home, Office etc.   Each address element has multiple values in it  
>like street, state etc.Thus each address element is a group with  
>the state and street in one address element being related to each other.
>
>It looks like this in my source xml
>
>
>
>
>
>
>
>
>I have setup my DIH to treat these as entities as below
>
>
>
>
>baseDir="***"
>  fileName=".*xml"
>  rootEntity="false"
>  dataSource="null" >
> name="record"
>  processor="XPathEntityProcessor"
>  stream="false"
>  forEach="/record"
>url="${f.fileAbsolutePath}">
> 
>
> 
>name="record_adr"
>processor="XPathEntityProcessor"
>stream="false"
>forEach="/record/address"
>url="${f.fileAbsolutePath}">
>  
> xpath="/record/address//@state" />
>  
>   
>
>  
>
>
>
>
>The problem is as follows.  DIH seems to treat these as entities but  
>solr seems to flatten them out on indexing to fields in a document  
>(losing the entity part).
>
>So when I search for the an ID - in the response all the street fields  
>are bunched to-gather, followed by all the state fields type etc.   
>Thus I can't associate which street address corresponds to which  
>address type in the response.
>
>What seems harder is this - say I need to query on 'Street' = XYZ1 and  
>type="Office".  This should NOT return a document since the street for  
>the office address is "XY2" and not "XYZ1".  However when I query for  
>address_state:"XYZ1" and address_type:"Office" I get back this document.
>
>The problem seems to be that while DIH allows 'entities' within a  
>document  the SOLR schema does not preserve them - it 'flattens' all  
>of them out as indices for the document.
>
>I could work around the problem by creating SOLR fields like  
>"home_address_street" and "office_address_street" and do some xpath  
>mapping.  However I don't want to do it as we can have multiple  
>'other' addresses.  Also I have other fields whose type is not easily  
>distinguished like address.
>
>As I mentioned being new to SOLR I might have completely goofed on a  
>way to set it up - much appreciate any direction on it. I am using  
>SOLR 1.3
>
>Regards,
>Guna

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-24 Thread Noble Paul നോബിള്‍ नोब्ळ्

nesting of an XPathEntityProcessor into another XPathEntityProcessor
is possible only if a field in an xml is a filename/url .
what is the purpose of nesting like this?
is it because you have multiple addresses? the possible solutions are
discussed elsewhere in this thread

On Sat, Jan 24, 2009 at 2:41 PM, Fergus McMenemie  wrote:
> Hello,
>
> I am also a newbie and was wanting to do almost the exact same thing.
> I was planning on doing the equivalent of:-
>
> 
>
>
>baseDir="***"
>  fileName=".*xml"
>  rootEntity="false"
>  dataSource="null" >
>name="record"
>   processor="XPathEntityProcessor"
>   stream="false"
>   rootEntity="false"***changed***
>   forEach="/record"
>   url="${f.fileAbsolutePath}">
>  
> ***change**
> 
>   name="record_adr"
> processor="XPathEntityProcessor"
> stream="false"
> forEach="/record/address"
> url="${f.fileAbsolutePath}">
>  
>   xpath="/record/address//@state" />
>  
>
>
>  
>
> 
>
> ID is no longer unique within Solr, There would be multiple "documents"
> with a given ID; one for each address. You can then search on ID and get
> the three addresses, you can also search on an address more sensibly.
>
> I have not been able to try this yet as other issues are still to be
> dealt with.
>
> Comments?
>
>>Hi
>>I may be completely off on this being new to SOLR but I am not sure
>>how to index related groups of fields in a document and preserver
>>their 'grouping'.   I  would appreciate any help on this.Detailed
>>description of the problem below.
>>
>>I am trying to index an entity that can have multiple occurrences in
>>the same document - e.g. Address.  The address could be Shipping,
>>Home, Office etc.   Each address element has multiple values in it
>>like street, state etc.Thus each address element is a group with
>>the state and street in one address element being related to each other.
>>
>>It looks like this in my source xml
>>
>>
>>
>>
>>
>>
>>
>>
>>I have setup my DIH to treat these as entities as below
>>
>>
>>
>>
>>  >  baseDir="***"
>>  fileName=".*xml"
>>  rootEntity="false"
>>  dataSource="null" >
>> >name="record"
>>  processor="XPathEntityProcessor"
>>  stream="false"
>>  forEach="/record"
>>url="${f.fileAbsolutePath}">
>> 
>>
>> 
>>  >  name="record_adr"
>>processor="XPathEntityProcessor"
>>stream="false"
>>forEach="/record/address"
>>url="${f.fileAbsolutePath}">
>>  
>>> xpath="/record/address//@state" />
>>  
>>   
>>
>>  
>>
>>
>>
>>
>>The problem is as follows.  DIH seems to treat these as entities but
>>solr seems to flatten them out on indexing to fields in a document
>>(losing the entity part).
>>
>>So when I search for the an ID - in the response all the street fields
>>are bunched to-gather, followed by all the state fields type etc.
>>Thus I can't associate which street address corresponds to which
>>address type in the response.
>>
>>What seems harder is this - say I need to query on 'Street' = XYZ1 and
>>type="Office".  This should NOT return a document since the street for
>>the office address is "XY2" and not "XYZ1".  However when I query for
>>address_state:"XYZ1" and address_type:"Office" I get back this document.
>>
>>The problem seems to be that while DIH allows 'entities' within a
>>document  the SOLR schema does not preserve them - it 'flattens' all
>>of them out as indices for the document.
>>
>>I could work around the problem by creating SOLR fields like
>>"home_address_street" and "office_address_street" and do some xpath
>>mapping.  However I don't want to do it as we can have multiple
>>'other' addresses.  Also I have other fields whose type is not easily
>>distinguished like address.
>>
>>As I mentioned being new to SOLR I might have completely goofed on a
>>way to set it up - much appreciate any direction on it. I am using
>>SOLR 1.3
>>
>>Regards,
>>Guna
>
> --
>
> ===
> Fergus McMenemie   Email:fer...@twig.me.uk
> Techmore Ltd   Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets Analyst Programmer
> ===
>



-- 
--Noble Paul

Re: Master failover - seeking comments

2009-01-24 Thread Noble Paul നോബിള്‍ नोब्ळ्

Did you look at the new in-built replication?
http://wiki.apache.org/solr/SolrReplication#head-0e25211b6ef50373fcc2f9a6ad40380c169a5397

It can help you decide where to replicate from during runtime . Look
at the snappull command you can pass the masterUrl at the time of
replication.



On Fri, Jan 23, 2009 at 7:55 PM, edre...@ha  wrote:
>
> Thanks for the response. Let me clarify things a bit.
>
> Regarding the Slaves:
> Our project is a web application. It is our desire to embedd Solr into the
> web application.   The web applications are configured with a local embedded
> Solr instance configured as a slave, and a remote Solr instance configured
> as a master.
>
> We have a requirement for real-time updates to the Solr indexes.  Our
> strategy is to use the local embedded Solr instance as a read-only
> repository.  Any time a write is made, we will send it to the remote Master.
> Once a user pushes a write operation to the remote Master, all subsequent
> read operations for this user now are made against the Master for the
> duration of the session.  This approximates "realtime" updates and seems to
> work for our purposes.  Writes to our system are a small percentage of Read
> operations.
>
> Now, back to the original question.  We're simply looking for failover
> solution if the Master server goes down.  Oh, and we are using the
> replication scripts to sync the servers.
>
>
>
>> It seems like you are trying to write to Solr directly from your front end
>> application. This is why you are thinking of multiple masters. I'll let
>> others comment on how easy/hard/correct the solution would be.
>>
>
> Well, yes.  We have business requirements that want updates to Solr to be
> realtime, or as close to that as possible, so when a user changes something,
> our strategy was to save it to the DB and push it to the Solr Master as
> well.  Although, we will have a background application that will help ensure
> that Solr is in sync with the DB for times that Solr is down and the DB is
> not.
>
>
>
>> But, do you really need to have live writes? Can they be channeled through
>> a
>> background process? Since you anyway cannot do a commit per-write, the
>> advantage of live writes is minimal. Moreover you would need to invest a
>> lot
>> of time in handling availability concerns to avoid losing updates. If you
>> log/record the write requests to an intermediate store (or queue), you can
>> do with one master (with another host on standby acting as a slave).
>>
>
> We do need to have live writes, as I mentioned above.  The concern you
> mention about losing live writes is exactly why we are looking at a Master
> Solr server failover strategy.  We thought about having a backup Solr server
> that is a Slave to the Master and could be easily reconfigured as a new
> Master in a pinch.  Our operations team has pushed us to come up with a
> solution that would be more seamless.  This is why we came up with a
> Master/Master solution where both Masters are also slaves to each other.
>
>
>
>>>
>>> To test this, I ran the following scenario.
>>>
>>> 1) Slave 1 (S1) is configured to use M2 as it's master.
>>> 2) We push an update to M2.
>>> 3) We restart S1, now pointing to M1.
>>> 4) We wait for M1 to sync from M2
>>> 5) We then sync S1 to M1.
>>> 6) Success!
>>>
>>
>> How do you co-ordinate all this?
>>
>
> This was just a test scenario I ran manually to see if the setup I described
> above would even work.
>
> Is there a Wiki page that outlines typical web application Solr deployment
> strategies?  There are a lot of questions on the forum about this type of
> thing (including this one).  For those who have expertise in this area, I'm
> sure there are many who could benefit from this (hint hint).
>
> As before, any comments or suggestions on the above would be much
> appreciated.
>
> Thanks,
> Erik
> --
> View this message in context: 
> http://www.nabble.com/Master-failover---seeking-comments-tp21614750p21625324.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul

Re: Random queries extremely slow

2009-01-24 Thread Alexander Ramos Jardim

Use multiple boxes, with a mirroring delaay from one to another, like a
pipeline.

2009/1/22 oleg_gnatovskiy 

>
> Well this probably isn't the cause of our random slow queries, but might be
> the cause of the slow queries after pulling a new index. Is there anything
> we could do to reduce the performance hit we take from this happening?
>
>
>
> Otis Gospodnetic wrote:
> >
> > Here is one example: pushing a large newly optimized index onto the
> > server.
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > - Original Message 
> >> From: oleg_gnatovskiy 
> >> To: solr-user@lucene.apache.org
> >> Sent: Thursday, January 22, 2009 2:22:51 PM
> >> Subject: Re: Random queries extremely slow
> >>
> >>
> >> What are some things that could happen to force files out of the cache
> on
> >> a
> >> Linux machine? I don't know what kinds of events to look for...
> >>
> >>
> >>
> >>
> >> yonik wrote:
> >> >
> >> > On Thu, Jan 22, 2009 at 1:46 PM, oleg_gnatovskiy
> >> > wrote:
> >> >> Hello. Our production servers are operating relatively smoothly most
> >> of
> >> >> the
> >> >> time running Solr with 19 million listings. However every once in a
> >> while
> >> >> the same query that used to take 100 miliseconds takes 6000.
> >> >
> >> > Anything else happening on the system that may have forced some of the
> >> > index files out of operating system disk cache at these times?
> >> >
> >> > -Yonik
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/Random-queries-extremely-slow-tp21610568p21611240.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Random-queries-extremely-slow-tp21610568p21611454.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Alexander Ramos Jardim

Re: Results not appearing

2009-01-24 Thread Johnny X


They all appear in the stats admin page under the NumDocs & maxDocs fields.

I don't explicitly send a commit command, but my posting ends like this
(suggesting they are commited):

SimplePostTool: POSTing file 21166.xml
SimplePostTool: POSTing file 21169.xml
SimplePostTool: COMMITting Solr index changes..

I just tried re-posting all the documents set as "text" -- will that update
the current documents indexed? (bearing in mind the unique key, message-id,
will be included again)

When I try searching I still get 0 results for anything included in the
message-id and content fields, both of which should be indexed and returning
results...


Cheers for any help!


ryguasu wrote:
> 
> These might be obvious, but:
> 
> * I assume you did a Solr commit command after indexing, right?
> 
> * If you are using the fieldtype definitions from the default
> schema.xml, then your "string" fields are not being analyzed, which
> means you should expect search results only if you enter the entire,
> exact value of one of the Message-ID or Date fields in your query. Is
> that your intention?
> 
> And yes, your analysis of "stored" seems correct. Stored fields are
> those whose values you need back at query time, and indexed fields are
> those you can do queries on. For a few complications, see
> http://wiki.apache.org/solr/FieldOptionsByUseCase
> 
> On Fri, Jan 23, 2009 at 8:04 PM, Johnny X 
> wrote:
>>
>> I've indexed my XML using the below in the schema:
>>
>>   > required="true"/>
>>   
>>   
>>   
>>   
>>   > stored="true"/>
>>   > stored="true"/>
>>   > stored="true"/>
>>   
>>   
>>   
>>   
>>   
>>   
>>   
>>   
>>
>>  Message-ID
>>
>> However searching via the Message-ID or Content fields returns 0. Using
>> Luke
>> I can still see these fields are stored however.
>>
>> Out of interest, by setting the other fields to just "stored=true", can
>> they
>> be returned in a query as part of a search?
>>
>>
>> Cheers.
>> --
>> View this message in context:
>> http://www.nabble.com/Results-not-appearing-tp21637069p21637069.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Results-not-appearing-tp21637069p21640562.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-24 Thread Noble Paul നോബിള്‍ नोब्ळ्

Hi Fergus,
XPathEntityprocessor can read multivalued fields easily

eg

   
   
 

 ***change**

 
 

   
 
   



In this case all address_street,address_state,address_type will be
returned as separate lists while parsing. If you wish to put them into
multple fields you can write a transformer and iterate thru the lists
and put them into separate fields. If there are 3  tags then
you get a List for each fields where the length of the
list==3. If an item is missing it will be added as a null.

ensure that the fields are marked as multiValued="true" in the
schema.xml. Otherwise it does not return List  . If there is
no corresponding mapping in schema.xml you can explicitly put it here
in the dataconfig.xml
eg: 


I saw the syntax '/record/address//@state'. '//' is not supported .
You will have to explicitly give the full path.
--Noble



On Sat, Jan 24, 2009 at 2:57 PM, Noble Paul നോബിള്‍  नोब्ळ्
 wrote:
> nesting of an XPathEntityProcessor into another XPathEntityProcessor
> is possible only if a field in an xml is a filename/url .
> what is the purpose of nesting like this?
> is it because you have multiple addresses? the possible solutions are
> discussed elsewhere in this thread
>
> On Sat, Jan 24, 2009 at 2:41 PM, Fergus McMenemie  wrote:
>> Hello,
>>
>> I am also a newbie and was wanting to do almost the exact same thing.
>> I was planning on doing the equivalent of:-
>>
>> 
>>
>>
>>  >  baseDir="***"
>>  fileName=".*xml"
>>  rootEntity="false"
>>  dataSource="null" >
>> >   name="record"
>>   processor="XPathEntityProcessor"
>>   stream="false"
>>   rootEntity="false"***changed***
>>   forEach="/record"
>>   url="${f.fileAbsolutePath}">
>>  
>> ***change**
>> 
>>  > name="record_adr"
>> processor="XPathEntityProcessor"
>> stream="false"
>> forEach="/record/address"
>> url="${f.fileAbsolutePath}">
>>  
>>  > xpath="/record/address//@state" />
>>  
>>
>>
>>  
>>
>> 
>>
>> ID is no longer unique within Solr, There would be multiple "documents"
>> with a given ID; one for each address. You can then search on ID and get
>> the three addresses, you can also search on an address more sensibly.
>>
>> I have not been able to try this yet as other issues are still to be
>> dealt with.
>>
>> Comments?
>>
>>>Hi
>>>I may be completely off on this being new to SOLR but I am not sure
>>>how to index related groups of fields in a document and preserver
>>>their 'grouping'.   I  would appreciate any help on this.Detailed
>>>description of the problem below.
>>>
>>>I am trying to index an entity that can have multiple occurrences in
>>>the same document - e.g. Address.  The address could be Shipping,
>>>Home, Office etc.   Each address element has multiple values in it
>>>like street, state etc.Thus each address element is a group with
>>>the state and street in one address element being related to each other.
>>>
>>>It looks like this in my source xml
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>I have setup my DIH to treat these as entities as below
>>>
>>>
>>>
>>>
>>>  >>  baseDir="***"
>>>  fileName=".*xml"
>>>  rootEntity="false"
>>>  dataSource="null" >
>>> >>name="record"
>>>  processor="XPathEntityProcessor"
>>>  stream="false"
>>>  forEach="/record"
>>>url="${f.fileAbsolutePath}">
>>> 
>>>
>>> 
>>>  >>  name="record_adr"
>>>processor="XPathEntityProcessor"
>>>stream="false"
>>>forEach="/record/address"
>>>url="${f.fileAbsolutePath}">
>>>  
>>>>> xpath="/record/address//@state" />
>>>  
>>>   
>>>
>>>  
>>>
>>>
>>>
>>>
>>>The problem is as follows.  DIH seems to treat these as entities but
>>>solr seems to flatten them out on indexing to fields in a document
>>>(losing the entity part).
>>>
>>>So when I search for the an ID - in the response all the street fields
>>>are bunched to-gather, followed by all the state fields type etc.
>>>Thus I can't associate which street address corresponds to which
>>>address type in the response.
>>>
>>>What seems harder is this - say I need to query on 'Street' = XYZ1 and
>>>type="Office".  This should NOT return a document since the street for
>>>the office address is "XY2" and not "XYZ1".  However when I quer

Re: Results not appearing

2009-01-24 Thread Johnny X


If it helps, everything appears when I use Luke to search through the
index...but the search in that returns nothing either.

When I search using the admin page for the word 'Phillip' (which appears the
most in all of the documents) I get the following:

   
- 
- 
  0 
  0 
- 
  on 
  0 
  phillip 
  10 
  2.2 
  
  
   
  


Duh...?



Johnny X wrote:
> 
> They all appear in the stats admin page under the NumDocs & maxDocs
> fields.
> 
> I don't explicitly send a commit command, but my posting ends like this
> (suggesting they are commited):
> 
> SimplePostTool: POSTing file 21166.xml
> SimplePostTool: POSTing file 21169.xml
> SimplePostTool: COMMITting Solr index changes..
> 
> I just tried re-posting all the documents set as "text" -- will that
> update the current documents indexed? (bearing in mind the unique key,
> message-id, will be included again)
> 
> When I try searching I still get 0 results for anything included in the
> message-id and content fields, both of which should be indexed and
> returning results...
> 
> 
> Cheers for any help!
> 
> 
> ryguasu wrote:
>> 
>> These might be obvious, but:
>> 
>> * I assume you did a Solr commit command after indexing, right?
>> 
>> * If you are using the fieldtype definitions from the default
>> schema.xml, then your "string" fields are not being analyzed, which
>> means you should expect search results only if you enter the entire,
>> exact value of one of the Message-ID or Date fields in your query. Is
>> that your intention?
>> 
>> And yes, your analysis of "stored" seems correct. Stored fields are
>> those whose values you need back at query time, and indexed fields are
>> those you can do queries on. For a few complications, see
>> http://wiki.apache.org/solr/FieldOptionsByUseCase
>> 
>> On Fri, Jan 23, 2009 at 8:04 PM, Johnny X 
>> wrote:
>>>
>>> I've indexed my XML using the below in the schema:
>>>
>>>   >> required="true"/>
>>>   
>>>   
>>>   
>>>   
>>>   >> stored="true"/>
>>>   >> stored="true"/>
>>>   >> stored="true"/>
>>>   
>>>   
>>>   
>>>   
>>>   
>>>   
>>>   
>>>   
>>>
>>>  Message-ID
>>>
>>> However searching via the Message-ID or Content fields returns 0. Using
>>> Luke
>>> I can still see these fields are stored however.
>>>
>>> Out of interest, by setting the other fields to just "stored=true", can
>>> they
>>> be returned in a query as part of a search?
>>>
>>>
>>> Cheers.
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Results-not-appearing-tp21637069p21637069.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Results-not-appearing-tp21637069p21641692.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr-duplicate post management

2009-01-24 Thread S.Selvam Siva

On Thu, Jan 22, 2009 at 2:33 PM, S.Selvam Siva wrote:

>
>
> On Thu, Jan 22, 2009 at 7:12 AM, Chris Hostetter  > wrote:
>
>>
>> : what i need is ,to log the existing urlid and new urlid(of course both
>> will
>> : not be same) ,when a .xml file of same id(unique field) is posted.
>> :
>> : I want to make this by modifying the solr source.Which file do i need to
>> : modify so that i could get the above details in log ?
>> :
>> : I tried with DirectUpdateHandler2.java(which removes the duplicate
>> : entries),but efforts in vein.
>>
>> DirectUpdateHandler2.java (on the trunk) delegates to Lucene-Java's
>> IndexWriter.updateDocument method when you have a uniqueKey and you aren't
>> allowing duplicates -- this method doesn't give you any way to access the
>> old document(s) that had that existing key.
>>
>> The easiest way to make a change like what you are interested in might be
>> an UpdateProcessor that does a lookup/search for the uniqueKey of each
>> document about to be added to see if it already exists.  that's probably
>> about as efficient as you can get, and would be nicely encapsulated.
>>
>> You might also want to take a look at SOLR-799, where some work is being
>> done to create UpdateProcessors that can do "near duplicate" detection...
>>
>> http://wiki.apache.org/solr/Deduplication
>> https://issues.apache.org/jira/browse/SOLR-799
>>
>>
>>
>>
>>
>>
>> -Hoss
>>
>
>

Hi, i added some code to *DirectUpdateHandler2.java's doDeletions()* (solr
1.2.0) ,and got the solution i wanted.(logging duplicate post entry-i.e old
field and new field of duplicate post)


   Document d1=searcher.doc(prev);//existing doc to be deleted
   Document d2=searcher.doc(tdocs.doc());//new doc
   String oldname=d1.get("name");
   String id1=d1.get("id");
   String newname=d2.get("name");
   String id2=d1.get("id");
   out3.write(id1+","+oldname+","+newname+"\n");

But i dont know ,wether the performance of solr will be affected by this.
Any comment on the performance issue for the above solution is welcome...
-- 
Yours,
S.Selvam

Re: faceting question

2009-01-24 Thread Cam Bazz

is there no other way then to use the patch?

since the query A is super set of B ???

if not doable, I will probably use some caching technique.

Best.

On Sat, Jan 24, 2009 at 9:14 AM, Shalin Shekhar Mangar
 wrote:
> On Sat, Jan 24, 2009 at 6:56 AM, Cam Bazz  wrote:
>
>> Hello;
>>
>> I got a multiField named tagList which may contain multiple tags. I am
>> making a query like:
>>
>> tagList:a AND tagList:b AND tagList:c
>>
>> and I am also getting a tagList facet returning me some values.
>>
>> What I would like is Solr to return me facets as if the query was:
>> tagList:a AND tagList:b
>>
>> is it even possible?
>>
>
> If I understand correctly,
> 1. You want to query for tagList:a AND tagList:b AND tagList:c
> 2. At the same time, you want to request facets for tagList but only for
> tagList:a and tagList:b
>
> If that is correct, you can use the features introduced by
> https://issues.apache.org/jira/browse/SOLR-911
>
> However you may need to put #1 as fq instead of q.
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Results not appearing

2009-01-24 Thread Chris Harris

I should clarify that I misspoke before; I thought you had
indexed="true" on Message-Id and Date, whereas you had it on
Message-Id and Content. It sounds like you figured this out and
interpreted my reply in a useful way nonetheless, though. So that's
good.

The post tool should be a valid way to commit.

As for your technique of updating the field types and reindexing the
documents, I think it should be fine provided you kept the field type
for the Message-Id field as string. If you changed it to text along
with the other field types, then there's a chance your "update"
technique might instead of the effect of inserting a duplicate copy of
each document, so there are two copies of each document, one
searchable, and one not searchable. (I'm not totally sure about this,
but it's a worry I would have.) That doesn't sound like what's
happened to you, though.

Could the problem be that you're not specifying which field to query?
If you're using the standard query analyzer and the stock schema.xml,
then the default field name is "text", whereas you don't have a field
called "text" in your schema. In that setup if you want to search on
the Content field you need to say so explicitly, like so:

Content:phillip

On Sat, Jan 24, 2009 at 7:25 AM, Johnny X  wrote:
>
> If it helps, everything appears when I use Luke to search through the
> index...but the search in that returns nothing either.
>
> When I search using the admin page for the word 'Phillip' (which appears the
> most in all of the documents) I get the following:
>
>  
> - 
> - 
>  0
>  0
> - 
>  on
>  0
>  phillip
>  10
>  2.2
>  
>  
>  
>  
>
>
> Duh...?
>
>
>
> Johnny X wrote:
>>
>> They all appear in the stats admin page under the NumDocs & maxDocs
>> fields.
>>
>> I don't explicitly send a commit command, but my posting ends like this
>> (suggesting they are commited):
>>
>> SimplePostTool: POSTing file 21166.xml
>> SimplePostTool: POSTing file 21169.xml
>> SimplePostTool: COMMITting Solr index changes..
>>
>> I just tried re-posting all the documents set as "text" -- will that
>> update the current documents indexed? (bearing in mind the unique key,
>> message-id, will be included again)
>>
>> When I try searching I still get 0 results for anything included in the
>> message-id and content fields, both of which should be indexed and
>> returning results...
>>
>>
>> Cheers for any help!
>>
>>
>> ryguasu wrote:
>>>
>>> These might be obvious, but:
>>>
>>> * I assume you did a Solr commit command after indexing, right?
>>>
>>> * If you are using the fieldtype definitions from the default
>>> schema.xml, then your "string" fields are not being analyzed, which
>>> means you should expect search results only if you enter the entire,
>>> exact value of one of the Message-ID or Date fields in your query. Is
>>> that your intention?
>>>
>>> And yes, your analysis of "stored" seems correct. Stored fields are
>>> those whose values you need back at query time, and indexed fields are
>>> those you can do queries on. For a few complications, see
>>> http://wiki.apache.org/solr/FieldOptionsByUseCase
>>>
>>> On Fri, Jan 23, 2009 at 8:04 PM, Johnny X 
>>> wrote:

 I've indexed my XML using the below in the schema:

   >>> required="true"/>

   >>> stored="true"/>
   >>> stored="true"/>
   >>> stored="true"/>

  Message-ID

 However searching via the Message-ID or Content fields returns 0. Using
 Luke
 I can still see these fields are stored however.

 Out of interest, by setting the other fields to just "stored=true", can
 they
 be returned in a query as part of a search?

 Cheers.
 --
 View this message in context:
 http://www.nabble.com/Results-not-appearing-tp21637069p21637069.html
 Sent from the Solr - User mailing list archive at Nabble.com.

>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Results-not-appearing-tp21637069p21641692.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Solr stemming -> preserve original words

2009-01-24 Thread AHMET ARSLAN

I still don't understand your final goal but if you want to get an output in 
the form of 
"run(40) => 20 from running, 10 from run, 8 from runners, 2 from runner" 
you need to index your documents using standard analyzer. Walk through the 
index using org.apache.lucene.index.IndexReader and stem each term using 
stemmer. Storing stems (key) and orignal word list (value) in a map will give 
that kind of output.

However if seeing something like the following list (not exactly you want but 
similar) on schema.jsp will help you

run=>run
run=>running
run=>runner
run=>runners

add one line of code 

newstr = newstr + "=>" +  new String(termBuffer, 0, len);

to org.apache.solr.analysis.EnglishPorterFilterFactory.java between lines #116 
and #117.

Rename the file, compile the code, put your jar file to libs directory under 
your solr home. Now you can use your new FilfterFactory in your schema.xml

--- On Sat, 1/24/09, Thushara Wijeratna  wrote:

> From: Thushara Wijeratna 
> Subject: Re: Solr stemming -> preserve original words
> To: solr-user@lucene.apache.org, iori...@yahoo.com
> Date: Saturday, January 24, 2009, 1:53 AM
> Chris, Ahmet - thanks for the responses.
> 
> Ahmet - yes, i want to see "run" as a top term +
> the original words that
> formed that term
> The reason is that due to mis-stemming, the terms could
> become non-english.
> ex:  "permanent" would stem to "perm",
> "archive" would become "archiv".
> 
> I need to extract a set of keywords from the indexed
> content - I'd like
> these to be correct full english words.
> 
> thanks,
> thushara

size of solr update document a limitation?

2009-01-24 Thread Paul Libbrecht



Hello Solr experts,

is good practice to post large solr update documents?
(e.g. 100kb-2mb).
Will solr do the necessary tricks to make the field use a reader  
instead of strings?


thanks in advance

paul

smime.p7s
Description: S/MIME cryptographic signature

Re: Results not appearing

2009-01-24 Thread Johnny X


Thanks for the reply.

I ended up fixing it by re-installing Tomcat and starting over. Searches now
appear to work.

Because I'm testing atm however, is it possible to delete the index and
start afresh in future.

At the moment I backed up the original index folder...if I just replace that
with the current one including an index will that work...or will other parts
of Solr recognise it's changed and as a result not work?

What's the best solution for removing the index?


Cheers.



ryguasu wrote:
> 
> I should clarify that I misspoke before; I thought you had
> indexed="true" on Message-Id and Date, whereas you had it on
> Message-Id and Content. It sounds like you figured this out and
> interpreted my reply in a useful way nonetheless, though. So that's
> good.
> 
> The post tool should be a valid way to commit.
> 
> As for your technique of updating the field types and reindexing the
> documents, I think it should be fine provided you kept the field type
> for the Message-Id field as string. If you changed it to text along
> with the other field types, then there's a chance your "update"
> technique might instead of the effect of inserting a duplicate copy of
> each document, so there are two copies of each document, one
> searchable, and one not searchable. (I'm not totally sure about this,
> but it's a worry I would have.) That doesn't sound like what's
> happened to you, though.
> 
> Could the problem be that you're not specifying which field to query?
> If you're using the standard query analyzer and the stock schema.xml,
> then the default field name is "text", whereas you don't have a field
> called "text" in your schema. In that setup if you want to search on
> the Content field you need to say so explicitly, like so:
> 
> Content:phillip
> 
> On Sat, Jan 24, 2009 at 7:25 AM, Johnny X 
> wrote:
>>
>> If it helps, everything appears when I use Luke to search through the
>> index...but the search in that returns nothing either.
>>
>> When I search using the admin page for the word 'Phillip' (which appears
>> the
>> most in all of the documents) I get the following:
>>
>>  
>> - 
>> - 
>>  0
>>  0
>> - 
>>  on
>>  0
>>  phillip
>>  10
>>  2.2
>>  
>>  
>>  
>>  
>>
>>
>> Duh...?
>>
>>
>>
>> Johnny X wrote:
>>>
>>> They all appear in the stats admin page under the NumDocs & maxDocs
>>> fields.
>>>
>>> I don't explicitly send a commit command, but my posting ends like this
>>> (suggesting they are commited):
>>>
>>> SimplePostTool: POSTing file 21166.xml
>>> SimplePostTool: POSTing file 21169.xml
>>> SimplePostTool: COMMITting Solr index changes..
>>>
>>> I just tried re-posting all the documents set as "text" -- will that
>>> update the current documents indexed? (bearing in mind the unique key,
>>> message-id, will be included again)
>>>
>>> When I try searching I still get 0 results for anything included in the
>>> message-id and content fields, both of which should be indexed and
>>> returning results...
>>>
>>>
>>> Cheers for any help!
>>>
>>>
>>> ryguasu wrote:

 These might be obvious, but:

 * I assume you did a Solr commit command after indexing, right?

 * If you are using the fieldtype definitions from the default
 schema.xml, then your "string" fields are not being analyzed, which
 means you should expect search results only if you enter the entire,
 exact value of one of the Message-ID or Date fields in your query. Is
 that your intention?

 And yes, your analysis of "stored" seems correct. Stored fields are
 those whose values you need back at query time, and indexed fields are
 those you can do queries on. For a few complications, see
 http://wiki.apache.org/solr/FieldOptionsByUseCase

 On Fri, Jan 23, 2009 at 8:04 PM, Johnny X 
 wrote:
>
> I've indexed my XML using the below in the schema:
>
>    required="true"/>
>   
>   
>   
>   
>    stored="true"/>
>    stored="true"/>
>    indexed="false"
> stored="true"/>
>   
>   
>   
>   
>   
>   
>    stored="true"/>
>   
>
>  Message-ID
>
> However searching via the Message-ID or Content fields returns 0.
> Using
> Luke
> I can still see these fields are stored however.
>
> Out of interest, by setting the other fields to just "stored=true",
> can
> they
> be returned in a query as part of a search?
>
>
> Cheers.
> --
> View this message in context:
> http://www.nabble.com/Results-not-appearing-tp21637069p21637069.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Results-not-appearing-tp21637069p21641692.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Results-not-appearing-tp21637069p216

Re: Results not appearing

2009-01-24 Thread Chris Harris

Without you stopping Solr itself, a solr client can remove all the
documents in an index by doing a delete-by-query with the query "*:*"
(without quotes). For XML interface clients, see
http://wiki.apache.org/solr/UpdateXmlMessage. Solrj would have another
way to do it. You'll need to do a commit after this to flush your
changes.

Alternatively, you can stop Solr and delete the whole data/ directory,
which includes the index directory. If you do this, Solr will create a
new fresh one the next time it starts up.

For backups it might be a better habit to backup the data/ directory,
rather than just the data/index directory. Assuming your schema.xml
hasn't changed, then you should be able to restore one data/ directory
with another. If you're changing your schema file, though, you need to
make sure you restore a version of that file that is consistent with
the one that you indexed with.

On Sat, Jan 24, 2009 at 5:43 PM, Johnny X  wrote:
>
> Thanks for the reply.
>
> I ended up fixing it by re-installing Tomcat and starting over. Searches now
> appear to work.
>
> Because I'm testing atm however, is it possible to delete the index and
> start afresh in future.
>
> At the moment I backed up the original index folder...if I just replace that
> with the current one including an index will that work...or will other parts
> of Solr recognise it's changed and as a result not work?
>
> What's the best solution for removing the index?
>
>
> Cheers.
>
>
>
> ryguasu wrote:
>>
>> I should clarify that I misspoke before; I thought you had
>> indexed="true" on Message-Id and Date, whereas you had it on
>> Message-Id and Content. It sounds like you figured this out and
>> interpreted my reply in a useful way nonetheless, though. So that's
>> good.
>>
>> The post tool should be a valid way to commit.
>>
>> As for your technique of updating the field types and reindexing the
>> documents, I think it should be fine provided you kept the field type
>> for the Message-Id field as string. If you changed it to text along
>> with the other field types, then there's a chance your "update"
>> technique might instead of the effect of inserting a duplicate copy of
>> each document, so there are two copies of each document, one
>> searchable, and one not searchable. (I'm not totally sure about this,
>> but it's a worry I would have.) That doesn't sound like what's
>> happened to you, though.
>>
>> Could the problem be that you're not specifying which field to query?
>> If you're using the standard query analyzer and the stock schema.xml,
>> then the default field name is "text", whereas you don't have a field
>> called "text" in your schema. In that setup if you want to search on
>> the Content field you need to say so explicitly, like so:
>>
>> Content:phillip
>>
>> On Sat, Jan 24, 2009 at 7:25 AM, Johnny X 
>> wrote:
>>>
>>> If it helps, everything appears when I use Luke to search through the
>>> index...but the search in that returns nothing either.
>>>
>>> When I search using the admin page for the word 'Phillip' (which appears
>>> the
>>> most in all of the documents) I get the following:
>>>
>>>  
>>> - 
>>> - 
>>>  0
>>>  0
>>> - 
>>>  on
>>>  0
>>>  phillip
>>>  10
>>>  2.2
>>>  
>>>  
>>>  
>>>  
>>>
>>>
>>> Duh...?
>>>
>>>
>>>
>>> Johnny X wrote:

 They all appear in the stats admin page under the NumDocs & maxDocs
 fields.

 I don't explicitly send a commit command, but my posting ends like this
 (suggesting they are commited):

 SimplePostTool: POSTing file 21166.xml
 SimplePostTool: POSTing file 21169.xml
 SimplePostTool: COMMITting Solr index changes..

 I just tried re-posting all the documents set as "text" -- will that
 update the current documents indexed? (bearing in mind the unique key,
 message-id, will be included again)

 When I try searching I still get 0 results for anything included in the
 message-id and content fields, both of which should be indexed and
 returning results...

 Cheers for any help!

 ryguasu wrote:
>
> These might be obvious, but:
>
> * I assume you did a Solr commit command after indexing, right?
>
> * If you are using the fieldtype definitions from the default
> schema.xml, then your "string" fields are not being analyzed, which
> means you should expect search results only if you enter the entire,
> exact value of one of the Message-ID or Date fields in your query. Is
> that your intention?
>
> And yes, your analysis of "stored" seems correct. Stored fields are
> those whose values you need back at query time, and indexed fields are
> those you can do queries on. For a few complications, see
> http://wiki.apache.org/solr/FieldOptionsByUseCase
>
> On Fri, Jan 23, 2009 at 8:04 PM, Johnny X 
> wrote:
>>
>> I've indexed my XML using the below in the schema:
>>
>>   > required="true"/>
>>   
>>

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-24 Thread Gunaranjan Chandraraju

I make this approach work with XPATH and XSL.   However, this approach  
creates multiple fields of like this


address_state_1
address_state_2
...
address_state_10

and

credit_card_1
credit_card_2
credit_card_3


How do I search for a credit_card.The query syntax does not seem  
to support wild cards in field names.   For e.g. I cant seem to do  
this ->   credit_card*:1234 4567 7890 1234


On the search side I would not know how many credit card fields  got  
created for a document and so I need that to be dynamic.


-g


On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:


Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.

On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:


On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
chandrar...@apple.com> wrote:




 
 
 
 


I have setup my DIH to treat these as entities as below


 
 
   
  
  

  
   
   
   
   
  
 
   
 




I think the only way is to create a dynamic field for each attribute
(street, state etc.). Write a transformer to copy the fields from  
your data
config to appropriately named dynamic field (e.g. street_1,  
state_1, etc).

To maintain this counter you will need to get/store it with
Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
Context#setSessionAttribute(name, val, Context.SCOPE_DOC).

I cant't think of an easier way.
--
Regards,
Shalin Shekhar Mangar.





--
Regards,
Shalin Shekhar Mangar.

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-24 Thread Noble Paul നോബിള്‍ नोब्ळ्

for searching you need to put them in a single field . use 
in schema.xml to achieve that

On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju
 wrote:
> I make this approach work with XPATH and XSL.   However, this approach
> creates multiple fields of like this
>
> address_state_1
> address_state_2
> ...
> address_state_10
>
> and
>
> credit_card_1
> credit_card_2
> credit_card_3
>
>
> How do I search for a credit_card.The query syntax does not seem to
> support wild cards in field names.   For e.g. I cant seem to do this ->
> credit_card*:1234 4567 7890 1234
>
> On the search side I would not know how many credit card fields  got created
> for a document and so I need that to be dynamic.
>
> -g
>
>
> On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:
>
>> Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.
>>
>> On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
>> shalinman...@gmail.com> wrote:
>>
>>> On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
>>> chandrar...@apple.com> wrote:
>>>

 
  
  
  
  
 

 I have setup my DIH to treat these as entities as below

 
  
  
   >>>   baseDir="***"
   fileName=".*xml"
   rootEntity="false"
   dataSource="null" >
  >>> name="record"
 processor="XPathEntityProcessor"
 stream="false"
 forEach="/record"
 url="${f.fileAbsolutePath}">
  

  
   >>>   name="record_adr"
   processor="XPathEntityProcessor"
   stream="false"
   forEach="/record/address"
   url="${f.fileAbsolutePath}">
   >>> xpath="/record/address/@street" />
   >>> xpath="/record/address//@state" />
   >>> xpath="/record/address//@type" />
  
 
   
  
 

>>>
>>> I think the only way is to create a dynamic field for each attribute
>>> (street, state etc.). Write a transformer to copy the fields from your
>>> data
>>> config to appropriately named dynamic field (e.g. street_1, state_1,
>>> etc).
>>> To maintain this counter you will need to get/store it with
>>> Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
>>> Context#setSessionAttribute(name, val, Context.SCOPE_DOC).
>>>
>>> I cant't think of an easier way.
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>
>



-- 
--Noble Paul

Re: DataImport TXT file entity processor

Re: Should I extend DIH to handle POST too?

Re: Solr Replication: disk space consumed on slave much higher than on master

Re: How to make Relationships work for Multi-valued Index Fields?

Re: How to make Relationships work for Multi-valued Index Fields?

Re: Master failover - seeking comments

Re: Random queries extremely slow

Re: Results not appearing

Re: How to make Relationships work for Multi-valued Index Fields?

Re: Results not appearing

Re: solr-duplicate post management

Re: faceting question

Re: Results not appearing

Re: Solr stemming -> preserve original words

size of solr update document a limitation?

Re: Results not appearing

Re: Results not appearing

Re: How to make Relationships work for Multi-valued Index Fields?

Re: How to make Relationships work for Multi-valued Index Fields?

19 matches

Site Navigation

Mail list logo

Footer information