Boost, weight, proximity, ranking which one?

2010-09-03 Thread javaxmlsoapdev

I am using solr 1.4 version.

I have a requirement where need to show up all documents first which matched
most words from the free text search string. e.g. If user was searching for
two words with no quotes "connectivity breakup" my search results should
display all documents where both words matched first and then those docs
where one or the other word (from those two search words) matched. 

And those two search words could be present in two separate fields in solr
index then also I need to rank that document higher up.

can someone explain how this can be achieved in Solr. 

Thanks,
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Boost-weight-proximity-ranking-which-one-tp1414512p1414512.html
Sent from the Solr - User mailing list archive at Nabble.com.


Numeric search in text field

2010-10-04 Thread javaxmlsoapdev

Hello,

I have a string "Marsh 1" (no quotes while searching). If I put "Marsh 1" in
the search box with no quotes I get expected results back but when I search
for just "1" (again no quotes) I don't get any results back. I use
WorldDelimiterFactory as follow. Any idea?


  
  
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Numeric-search-in-text-field-tp1633741p1633741.html
Sent from the Solr - User mailing list archive at Nabble.com.


Partial search extremly slow

2011-02-25 Thread javaxmlsoapdev

Since my users wanted to have a partial search functionality I had to
introduce following. I have declared two EdgeNGram filters with both side
"back" and "front" since they wanted to have partial search working from any
side. 


 
  
  
  
 


When executing search (which brings back 4K plus records from the index)
response time extremely slow. 

The two db columns which I index and search against are huge and where one
of the db columns is of type CLOB. This is to give you an idea that this db
column of type CLOB is being indexed with "edgyText" and also searched upon.
>From documentation I understand partial search behaves slow due to "gram"
nature. what's the best way to implement this functionality and still get
good response time?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-search-extremly-slow-tp2572861p2572861.html
Sent from the Solr - User mailing list archive at Nabble.com.


Smart Pagination queries

2011-03-08 Thread javaxmlsoapdev
e.g. There are 4,000 solr documents that were found for a particular word
search. My app has entitlement rules applied to those 4,000 documents and
it's quite possible that user is only eligible to view 3,000 results out of
4K. This is achieved through post filtering application logic. 

My question related to solr pagination is :
In order to paint "Next" links app would have to know total number of
records that user is eligible for read. getNumFound() will tell me that
there are total 4K records that Solr returned. If there wasn't any
entitlement rules then it could have been easier to determine how many
"Next" links to paint and when user clicks on "Next" pass in "start"
position appropriately in solr query. Since I have to apply post filter as
and when results are fetched from Solr is there a better way to achieve
this? e.g. Because of post filtering I wouldn't know whether to paint "Next"
link until results for "next" links are pre-fetched and filtered.
Pre-fetching won't work as that would kill the performance and have no
meaning of Solr pagination. Any better suggestion?

Thanks,

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Smart-Pagination-queries-tp2652273p2652273.html
Sent from the Solr - User mailing list archive at Nabble.com.


Custom request handler/plugin

2011-03-13 Thread javaxmlsoapdev
I have a requirement to filter solr results, based on my application
entitlement rules,before they are returned from Solr to the application. In
order to do this, I have to implement a custom plugin, which gets entire
solr results and apply my application rules (which will be exposed via web
service from the app for solr to consume). This way reults will be
completely filtered from Solr itself before application gets response from
Solr. Currently my search uses DisMax. Now will need to use DisMax first and
then custom plugin to filter the results. What's the best way to implement.
FYI: I have gone thru SolrPlugin wiki page but need more info of how
chaining (if possible) for handlers can be used. e.g. First dixmax and then
custom plugin/handler. 

Please advise.

Thanks,

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-request-handler-plugin-tp2673822p2673822.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Custom request handler/plugin

2011-03-13 Thread javaxmlsoapdev
Below are the reasons why I thought it wouldn't be feasible to have
pre-filtered results with filter queries. please comment.

Since can't pen down direct business reqs due to confidentially contact with
the client, I'll mock out scenario using an example.

- There is a parent entity called "Quiz", which has multiple "assignees".
Quiz has one-to-many with another entity called "Schools" and "School" has
multiple "Assessors". 

-There are two separate indexes 
1) for Quiz, Quiz title, details, status are the searchable attributes of
the Quiz and stored in the Quiz index.
2) School index. which has school name, and some other school related
searchable attributes in the index

now for e.g. when someone searches against Quiz title, search results will
be returned to the application. before displaying results following access
rules kick in from the application. they are nothing but java rules from the
application, which decide whether person can view a particular quiz document
(returned from Solr). 

-Rule first checks against an external entitlement service which has
authorization policy defining entitlement roles.  This external entitlement
service first returns "true" or "false" that whether person can "view"
"quiz" entity or not. 
  -If it returned false then that document is thrown out from the
resultset.
  -If it returned "true" then further check in the rule is; check
assignees of the quiz(from the database),   if person who logged in is
one of the assignees of that quiz document, which s/he is trying to view. If
yes then only s/he can view else not.
  -There is a super admin role in the entitlement service (again this is
external to our app). If logged in person is super admin then s/he can view
document. No need to check quiz assignments.

If I store quiz assignees into the index then on the fly update of a
document in the index would be too slow (yes we do update/insert index
documents on the fly when a new record is created or updated in the db).
Plus as I mentioned for super admin role there aren't any assignments. If I
decide to store super admins into the index, then it would mean there will
have to be some asynchronous thread running against entitlement service and
monitor this "super admin" role, everytime new person is added to that role,
add him to each indexed document's assignment list? 

The whole above thing looked too convoluted. Hence thought it would be clean
to leverage existing application logic (rules) and continue post processing.
That way pagination can work correctly if solr retruned resultset is also
filtered. If solr returned resultset isn't filtered then pagination won't
work correctly as application wouldn't know how many documents will be
kicked out (can't know how many "next" links to paint)

Sorry for the lengthy post but thought describing entire scenario would make
things clear from requirement and infrastructure point of view.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-request-handler-plugin-tp2673822p2674913.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Custom request handler/plugin

2011-03-17 Thread javaxmlsoapdev
Thanks for the response. Finally I have decided to build access intelligence
into the Solr to pre filter the results by storing required attributes in
the index to determine the access. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-request-handler-plugin-tp2673822p2696319.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr performance

2011-05-11 Thread javaxmlsoapdev
I have some 25 odd fields with "stored=true" in schema.xml. Retrieving back
5,000 records back takes a few secs. I also tried passing "fl" and only
include one field in the response but still response time is same. What are
the things to look to tune the performance.

Thanks, 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-performance-tp2926836p2926836.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr performance

2011-05-13 Thread javaxmlsoapdev
Alright. It turned out that defaultSearchField=title where title field is of
a custom fieldType=edgyText

where 

   
   


so if no value in the "q" parameter is passed, solr picks up default field,
which is tiltle of type "edgyText" taking a very long time to return
results. Is there a way to IGNORE default(which is there in schema.xml)
field dynamically while I only want to search filterlist on keys (e.g.
fl=keys)? gram search is slowing things down extremely. Crazy clients want
to have minimum word =1, which is kind of insane but that's how it is. 

Any idea?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-performance-tp2926836p2935175.html
Sent from the Solr - User mailing list archive at Nabble.com.


can't find solr.xml

2009-10-30 Thread javaxmlsoapdev

I have downloaded apache-solr-1.3.0.tgz for Linux and don't see solr.xml. can
someone assist.
-- 
View this message in context: 
http://old.nabble.com/can%27t-find-solr.xml-tp26136630p26136630.html
Sent from the Solr - User mailing list archive at Nabble.com.



Index documents with Solr

2009-11-04 Thread javaxmlsoapdev

Wanted to find out how people are using Solr’s ExtractingRequestHandler to
index different types of documents from a configuration file in an import
fashion. I want to use this handler in a similar way how DataImportHandler
works where you can issue “import” command from the URL to create an index
reading database table(s). 

For documents, I have a db table which stores files paths. Want to read
file’s location from a db table then create an index after reading document
content using ExtractingRequestHandler. Again trying to see if all this can
be done just from a configuration same way how DataImportHandler handles
this

-- 
View this message in context: 
http://old.nabble.com/Index-documents-with-Solr-tp26205991p26205991.html
Sent from the Solr - User mailing list archive at Nabble.com.



how to search against multiple attributes in the index

2009-11-13 Thread javaxmlsoapdev

I want to build AND search query against field1 AND field2 etc. Both these
fields are stored in an index. I am migrating lucene code to Solr. Following
is my existing lucene code

BooleanQuery currentSearchingQuery = new BooleanQuery();

currentSearchingQuery.add(titleDescQuery,Occur.MUST);
highlighter = new Highlighter( new QueryScorer(titleDescQuery));

TermQuery searchTechGroupQyery = new TermQuery(new Term
("techGroup",searchForm.getTechGroup()));
currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST);
TermQuery searchProgramQyery = new TermQuery(new
Term("techProgram",searchForm.getTechProgram()));
currentSearchingQuery.add(searchProgramQyery, Occur.MUST);
}

What's the equivalent Solr code for above Luce code. Any samples would be
appreciated.

Thanks,
-- 
View this message in context: 
http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: how to search against multiple attributes in the index

2009-11-13 Thread javaxmlsoapdev

I already did  dive in before. I am using solrj API and SolrQuery object to
build query. but its not clear/written how to build booleanQuery ANDing
bunch of different attributes in the index. Any samples please?

Avlesh Singh wrote:
> 
> Dive in - http://wiki.apache.org/solr/Solrj
> 
> Cheers
> Avlesh
> 
> On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev  wrote:
> 
>>
>> I want to build AND search query against field1 AND field2 etc. Both
>> these
>> fields are stored in an index. I am migrating lucene code to Solr.
>> Following
>> is my existing lucene code
>>
>> BooleanQuery currentSearchingQuery = new BooleanQuery();
>>
>> currentSearchingQuery.add(titleDescQuery,Occur.MUST);
>> highlighter = new Highlighter( new QueryScorer(titleDescQuery));
>>
>> TermQuery searchTechGroupQyery = new TermQuery(new Term
>> ("techGroup",searchForm.getTechGroup()));
>>currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST);
>> TermQuery searchProgramQyery = new TermQuery(new
>> Term("techProgram",searchForm.getTechProgram()));
>>currentSearchingQuery.add(searchProgramQyery, Occur.MUST);
>> }
>>
>> What's the equivalent Solr code for above Luce code. Any samples would be
>> appreciated.
>>
>> Thanks,
>> --
>> View this message in context:
>> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: how to search against multiple attributes in the index

2009-11-13 Thread javaxmlsoapdev

I think I found the answer. needed to read more API documentation :-)

you can do it using 
solrQuery.setFilterQueries() and build AND queries of multiple parameters.


Avlesh Singh wrote:
> 
> For a starting point, this might be a good read -
> http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query
> 
> Cheers
> Avlesh
> 
> On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev 
> wrote:
> 
>>
>> I already did  dive in before. I am using solrj API and SolrQuery object
>> to
>> build query. but its not clear/written how to build booleanQuery ANDing
>> bunch of different attributes in the index. Any samples please?
>>
>> Avlesh Singh wrote:
>> >
>> > Dive in - http://wiki.apache.org/solr/Solrj
>> >
>> > Cheers
>> > Avlesh
>> >
>> > On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev 
>> wrote:
>> >
>> >>
>> >> I want to build AND search query against field1 AND field2 etc. Both
>> >> these
>> >> fields are stored in an index. I am migrating lucene code to Solr.
>> >> Following
>> >> is my existing lucene code
>> >>
>> >> BooleanQuery currentSearchingQuery = new BooleanQuery();
>> >>
>> >> currentSearchingQuery.add(titleDescQuery,Occur.MUST);
>> >> highlighter = new Highlighter( new QueryScorer(titleDescQuery));
>> >>
>> >> TermQuery searchTechGroupQyery = new TermQuery(new Term
>> >> ("techGroup",searchForm.getTechGroup()));
>> >>currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST);
>> >> TermQuery searchProgramQyery = new TermQuery(new
>> >> Term("techProgram",searchForm.getTechProgram()));
>> >>currentSearchingQuery.add(searchProgramQyery, Occur.MUST);
>> >> }
>> >>
>> >> What's the equivalent Solr code for above Luce code. Any samples would
>> be
>> >> appreciated.
>> >>
>> >> Thanks,
>> >> --
>> >> View this message in context:
>> >>
>> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339903.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: how to search against multiple attributes in the index

2009-11-13 Thread javaxmlsoapdev

great. thanks. that was helpful

Avlesh Singh wrote:
> 
>>
>> you can do it using
>> solrQuery.setFilterQueries() and build AND queries of multiple
>> parameters.
>>
> Nope. You would need to read more -
> http://wiki.apache.org/solr/FilterQueryGuidance
> 
> For your impatience, here's a quick starter -
> 
> #and between two fields
> solrQuery.setQuery("+field1:foo +field2:bar");
> 
> #or between two fields
> solrQuery.setQuery("field1:foo field2:bar");
> 
> Cheers
> Avlesh
> 
> On Fri, Nov 13, 2009 at 10:35 PM, javaxmlsoapdev 
> wrote:
> 
>>
>> I think I found the answer. needed to read more API documentation :-)
>>
>> you can do it using
>> solrQuery.setFilterQueries() and build AND queries of multiple
>> parameters.
>>
>>
>> Avlesh Singh wrote:
>> >
>> > For a starting point, this might be a good read -
>> >
>> http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query
>> >
>> > Cheers
>> > Avlesh
>> >
>> > On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev 
>> > wrote:
>> >
>> >>
>> >> I already did  dive in before. I am using solrj API and SolrQuery
>> object
>> >> to
>> >> build query. but its not clear/written how to build booleanQuery
>> ANDing
>> >> bunch of different attributes in the index. Any samples please?
>> >>
>> >> Avlesh Singh wrote:
>> >> >
>> >> > Dive in - http://wiki.apache.org/solr/Solrj
>> >> >
>> >> > Cheers
>> >> > Avlesh
>> >> >
>> >> > On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev 
>> >> wrote:
>> >> >
>> >> >>
>> >> >> I want to build AND search query against field1 AND field2 etc.
>> Both
>> >> >> these
>> >> >> fields are stored in an index. I am migrating lucene code to Solr.
>> >> >> Following
>> >> >> is my existing lucene code
>> >> >>
>> >> >> BooleanQuery currentSearchingQuery = new BooleanQuery();
>> >> >>
>> >> >> currentSearchingQuery.add(titleDescQuery,Occur.MUST);
>> >> >> highlighter = new Highlighter( new QueryScorer(titleDescQuery));
>> >> >>
>> >> >> TermQuery searchTechGroupQyery = new TermQuery(new Term
>> >> >> ("techGroup",searchForm.getTechGroup()));
>> >> >>currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST);
>> >> >> TermQuery searchProgramQyery = new TermQuery(new
>> >> >> Term("techProgram",searchForm.getTechProgram()));
>> >> >>currentSearchingQuery.add(searchProgramQyery, Occur.MUST);
>> >> >> }
>> >> >>
>> >> >> What's the equivalent Solr code for above Luce code. Any samples
>> would
>> >> be
>> >> >> appreciated.
>> >> >>
>> >> >> Thanks,
>> >> >> --
>> >> >> View this message in context:
>> >> >>
>> >>
>> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html
>> >> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339903.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26340776.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to use DataImportHandler with ExtractingRequestHandler?

2009-11-20 Thread javaxmlsoapdev

did you extend DIH to do this work? can you share code samples. I have
similar requirement where I need tp index database records and each record
has a column with document path so need to create another index for
documents (we allow users to search both index separately) in parallel with
reading some meta data of documents from database as well. I have all sorts
of different document formats to index. fyi; I am on solr 1.4.0. Any
pointers would be appreciated.

Thanks,

Sascha Szott wrote:
> 
> Hi Khai,
> 
> a few weeks ago, I was facing the same problem.
> 
> In my case, this workaround helped (assuming, you're using Solr 1.3): 
> For each row, extract the content from the corresponding pdf file using 
> a parser library of your choice (I suggest Apache PDFBox or Apache Tika 
> in case you need to process other file types as well), put it between
> 
>   
> 
> and store it in a text file. To keep the relationship between a file and 
> its corresponding database row, use the primary key as the file name.
> 
> Within data-config.xml use the XPathEntityProcessor as follows (replace 
> dbRow and primaryKey respectively):
> 
>processor="XPathEntityProcessor"
>   forEach="/foo"
>   url="${dbRow.primaryKey}.xml">
>
> 
> 
> 
> And, by the way, in Solr 1.4 you do not have to put your content between 
> xml tags: use the PlainTextEntityProcessor instead of
> XPathEntityProcessor.
> 
> Best,
> Sascha
> 
> Khai Doan schrieb:
>> Hi all,
>> 
>> My name is Khai.  I have a table in a relational database.  I have
>> successfully use DataImportHandler to import this data into Apache Solr.
>> However, one of the column store the location of PDF file.  How can I
>> configure DataImportHandler to use ExtractingRequestHandler to extract
>> the
>> content of the PDF?
>> 
>> Thanks!
>> 
>> Khai Doan
>> 
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/How-to-use-DataImportHandler-with-ExtractingRequestHandler--tp25267745p26443544.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Index documents with Solr

2009-11-20 Thread javaxmlsoapdev

Glock, did you get this approach to work? let me know. 

Thanks,

Glock, Thomas wrote:
> 
> I have a similar situation but not expecting any easy setup.  Currently
> the tables contain both a url to the file and quite a bit of additional
> metadata about the file.  I'm planning one initial load to Solr by
> creating xml in my own utility which posts the xml.  Data is messy so DIH
> is not a good choice for this situation.  After the initial load (only
> ~12K documents - takes 10 minutes tops); I plan to perform a second pass
> which will use the extractingrequesthandler.  I know how the id will map
> but not clear yet how to get that id to ExtractingRequestHandler. Would be
> good to see different examples on the Wiki. Have not yet had a first
> attempt - hoping to in a day or so.
> 
> 
> -----Original Message-
> From: javaxmlsoapdev [mailto:vika...@yahoo.com]
> Sent: Wed 04-Nov-2009 5:42 PM
> To: solr-user@lucene.apache.org
> Subject: Index documents with Solr
>  
> 
> Wanted to find out how people are using Solr's ExtractingRequestHandler to
> index different types of documents from a configuration file in an import
> fashion. I want to use this handler in a similar way how DataImportHandler
> works where you can issue "import" command from the URL to create an index
> reading database table(s). 
> 
> For documents, I have a db table which stores files paths. Want to read
> file's location from a db table then create an index after reading
> document
> content using ExtractingRequestHandler. Again trying to see if all this
> can
> be done just from a configuration same way how DataImportHandler handles
> this
> 
> -- 
> View this message in context:
> http://old.nabble.com/Index-documents-with-Solr-tp26205991p26205991.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Index-documents-with-Solr-tp26205991p26443551.html
Sent from the Solr - User mailing list archive at Nabble.com.



Very busy search screen

2009-11-23 Thread javaxmlsoapdev

I have a client who wants to search on almost every attribute of an object
(nearly 15 attributes) on the search screen. Search sreen looks very
crazy/busy. I was wondering if there are better ways to address these
requirements and build intelligent categorized/configurable searchs?
including allowing user to choose if they want to AND or OR attributes etc?
Any pointers would be appreciated.

thanks,
-- 
View this message in context: 
http://old.nabble.com/Very-busy-search-screen-tp26482092p26482092.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to use DataImportHandler with ExtractingRequestHandler?

2009-11-23 Thread javaxmlsoapdev

Anyone any idea?

javaxmlsoapdev wrote:
> 
> did you extend DIH to do this work? can you share code samples. I have
> similar requirement where I need tp index database records and each record
> has a column with document path so need to create another index for
> documents (we allow users to search both index separately) in parallel
> with reading some meta data of documents from database as well. I have all
> sorts of different document formats to index. I am on solr 1.4.0. Any
> pointers would be appreciated.
> 
> Thanks,
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/How-to-use-DataImportHandler-with-ExtractingRequestHandler--tp25267745p26485245.html
Sent from the Solr - User mailing list archive at Nabble.com.



ExternalRequestHandler and ContentStreamUpdateRequest usage

2009-11-23 Thread javaxmlsoapdev

Following code is from my test case where it tries to index a file (of type
.txt)
ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
up.addFile(fileToIndex);
up.setParam("literal.key", "8978"); //key is the uniqueId
up.setParam("ext.literal.docName", "doc123.txt");
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);  
server.request(up); 

test case doesn't give me any error and "I think" its indexing the file? but
when I search for a text (which was part of the .txt file) search doesn't
return me anything.

Following is the config from solrconfig.xml where I have mapped content to
"description" field(default search field) in the schema.



  description
  description

  

Clearly it seems I am missing something. Any idea?

Thanks,
-- 
View this message in context: 
http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26486817.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

2009-11-23 Thread javaxmlsoapdev

*:* returns me 1 count but when I search for specific word (which was part of
.txt file I indexed before) it doesn't return me anything. I don't have luke
setup on my end. let me see if I can set that up quickly but otherwise do
you see anything I am missing in solrconfig mapping or something? which maps
document "content" to wrong attribute?

thanks,

Grant Ingersoll-6 wrote:
> 
> 
> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
> 
>> 
>> Following code is from my test case where it tries to index a file (of
>> type
>> .txt)
>> ContentStreamUpdateRequest up = new
>> ContentStreamUpdateRequest("/update/extract");
>> up.addFile(fileToIndex);
>> up.setParam("literal.key", "8978"); //key is the uniqueId
>> up.setParam("ext.literal.docName", "doc123.txt");
>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);   
>> server.request(up);  
>> 
>> test case doesn't give me any error and "I think" its indexing the file?
>> but
>> when I search for a text (which was part of the .txt file) search doesn't
>> return me anything.
> 
> What do your logs show?  Else, what does Luke show or doing a *:* query
> (assuming this is the only file you added)?
> 
> Also, I don't think you need ext.literal anymore, just literal.
> 
>> 
>> Following is the config from solrconfig.xml where I have mapped content
>> to
>> "description" field(default search field) in the schema.
>> 
>> > class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>
>>  description
>>  description
>>
>>  
>> 
>> Clearly it seems I am missing something. Any idea?
> 
> 
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

2009-11-23 Thread javaxmlsoapdev

FYI: weirdly its returning me following when I run
rsp.getResults().get(0).getFieldValue("description")

[702, text/plain, doc123.txt, ]

so it seems like its storing 

up.setParam("ext.literal.docName", "doc123.txt"); into description versus
file content in "description" attribute.

Any idea?

Thanks,

javaxmlsoapdev wrote:
> 
> *:* returns me 1 count but when I search for specific word (which was part
> of .txt file I indexed before) it doesn't return me anything. I don't have
> luke setup on my end. let me see if I can set that up quickly but
> otherwise do you see anything I am missing in solrconfig mapping or
> something? which maps document "content" to wrong attribute?
> 
> thanks,
> 
> Grant Ingersoll-6 wrote:
>> 
>> 
>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>> 
>>> 
>>> Following code is from my test case where it tries to index a file (of
>>> type
>>> .txt)
>>> ContentStreamUpdateRequest up = new
>>> ContentStreamUpdateRequest("/update/extract");
>>> up.addFile(fileToIndex);
>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>> up.setParam("ext.literal.docName", "doc123.txt");
>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);  
>>> server.request(up); 
>>> 
>>> test case doesn't give me any error and "I think" its indexing the file?
>>> but
>>> when I search for a text (which was part of the .txt file) search
>>> doesn't
>>> return me anything.
>> 
>> What do your logs show?  Else, what does Luke show or doing a *:* query
>> (assuming this is the only file you added)?
>> 
>> Also, I don't think you need ext.literal anymore, just literal.
>> 
>>> 
>>> Following is the config from solrconfig.xml where I have mapped content
>>> to
>>> "description" field(default search field) in the schema.
>>> 
>>> >> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>>
>>>  description
>>>  description
>>>
>>>  
>>> 
>>> Clearly it seems I am missing something. Any idea?
>> 
>> 
>> 
>> --
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487409.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

2009-11-24 Thread javaxmlsoapdev

http://machinename:port/solr/admin/luke gives me 404 error so seems like its
not able to find luke.

I am reusing schema, which is used for indexing other entity from database,
which has no relevance to documents. that was my next question that what do
I put in, in a schema if my documents don't need any column mappings or
anything. plus I want to keep file documents index separately from database
entity index. what's the best way to do this? If I don't have any db columns
etc to map and file documents index should leave separate from db entity
index, what's the best way to achieve this.

thanks,



Grant Ingersoll-6 wrote:
> 
> 
> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
> 
>> 
>> *:* returns me 1 count but when I search for specific word (which was
>> part of
>> .txt file I indexed before) it doesn't return me anything. I don't have
>> luke
>> setup on my end.
> 
> http://localhost:8983/solr/admin/luke should give yo some info.
> 
> 
>> let me see if I can set that up quickly but otherwise do
>> you see anything I am missing in solrconfig mapping or something?
> 
> What's your schema look like and how are you querying?
> 
>> which maps
>> document "content" to wrong attribute?
>> 
>> thanks,
>> 
>> Grant Ingersoll-6 wrote:
>>> 
>>> 
>>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>>> 
>>>> 
>>>> Following code is from my test case where it tries to index a file (of
>>>> type
>>>> .txt)
>>>> ContentStreamUpdateRequest up = new
>>>> ContentStreamUpdateRequest("/update/extract");
>>>> up.addFile(fileToIndex);
>>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>>> up.setParam("ext.literal.docName", "doc123.txt");
>>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); 
>>>> server.request(up);
>>>> 
>>>> test case doesn't give me any error and "I think" its indexing the
>>>> file?
>>>> but
>>>> when I search for a text (which was part of the .txt file) search
>>>> doesn't
>>>> return me anything.
>>> 
>>> What do your logs show?  Else, what does Luke show or doing a *:* query
>>> (assuming this is the only file you added)?
>>> 
>>> Also, I don't think you need ext.literal anymore, just literal.
>>> 
>>>> 
>>>> Following is the config from solrconfig.xml where I have mapped content
>>>> to
>>>> "description" field(default search field) in the schema.
>>>> 
>>>> >>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>>>   
>>>> description
>>>> description
>>>>   
>>>> 
>>>> 
>>>> Clearly it seems I am missing something. Any idea?
>>> 
>>> 
>>> 
>>> --
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>> 
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>> Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>> 
>>> 
>>> 
>> 
>> -- 
>> View this message in context:
>> http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26497295.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

2009-11-24 Thread javaxmlsoapdev

I was able to configure /docs index separately from my db data index.

still I am seeing same behavior where it only puts .docName & its size in
the "content" field (I have renamed field to "content" in this new schema)

below are the only two fields I have in schema.xml
 
 

Following is updated code from test case

File fileToIndex = new File("file.txt");

ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
up.addFile(fileToIndex);
up.setParam("literal.key", "8978");
up.setParam("literal.docName", "doc123.txt");
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
NamedList list = server.request(up);
assertNotNull("Couldn't upload .txt",list);

QueryResponse rsp = server.query( new SolrQuery( "*:*") );
assertEquals( 1, rsp.getResults().getNumFound() );
System.out.println(rsp.getResults().get(0).getFieldValue("content"));

Also from solr admin UI when I search for "doc123.txt" then only it returns
me following response. not sure why its not indexing file's content into
"content" attribute.
- 
- 
- 
  702 
  text/plain 
  doc123.txt 
   
  
  8978 
  
  

Any idea?

Thanks,


javaxmlsoapdev wrote:
> 
> http://machinename:port/solr/admin/luke gives me 404 error so seems like
> its not able to find luke.
> 
> I am reusing schema, which is used for indexing other entity from
> database, which has no relevance to documents. that was my next question
> that what do I put in, in a schema if my documents don't need any column
> mappings or anything. plus I want to keep file documents index separately
> from database entity index. what's the best way to do this? If I don't
> have any db columns etc to map and file documents index should leave
> separate from db entity index, what's the best way to achieve this.
> 
> thanks,
> 
> 
> 
> Grant Ingersoll-6 wrote:
>> 
>> 
>> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
>> 
>>> 
>>> *:* returns me 1 count but when I search for specific word (which was
>>> part of
>>> .txt file I indexed before) it doesn't return me anything. I don't have
>>> luke
>>> setup on my end.
>> 
>> http://localhost:8983/solr/admin/luke should give yo some info.
>> 
>> 
>>> let me see if I can set that up quickly but otherwise do
>>> you see anything I am missing in solrconfig mapping or something?
>> 
>> What's your schema look like and how are you querying?
>> 
>>> which maps
>>> document "content" to wrong attribute?
>>> 
>>> thanks,
>>> 
>>> Grant Ingersoll-6 wrote:
>>>> 
>>>> 
>>>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>>>> 
>>>>> 
>>>>> Following code is from my test case where it tries to index a file (of
>>>>> type
>>>>> .txt)
>>>>> ContentStreamUpdateRequest up = new
>>>>> ContentStreamUpdateRequest("/update/extract");
>>>>> up.addFile(fileToIndex);
>>>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>>>> up.setParam("ext.literal.docName", "doc123.txt");
>>>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>>>>> server.request(up);   
>>>>> 
>>>>> test case doesn't give me any error and "I think" its indexing the
>>>>> file?
>>>>> but
>>>>> when I search for a text (which was part of the .txt file) search
>>>>> doesn't
>>>>> return me anything.
>>>> 
>>>> What do your logs show?  Else, what does Luke show or doing a *:* query
>>>> (assuming this is the only file you added)?
>>>> 
>>>> Also, I don't think you need ext.literal anymore, just literal.
>>>> 
>>>>> 
>>>>> Following is the config from solrconfig.xml where I have mapped
>>>>> content
>>>>> to
>>>>> "description" field(default search field) in the schema.
>>>>> 
>>>>> >>>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>>>>   
>>>>> description
>>>>> description
>>>>>   
>>>>> 
>>>>> 
>>>>> Clearly it seems I am missing something. Any idea?
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com/
>>>> 
>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>>> using
>>>> Solr/Lucene:
>>>> http://www.lucidimagination.com/search
>>>> 
>>>> 
>>>> 
>>> 
>>> -- 
>>> View this message in context:
>>> http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 
>> --
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26498552.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

2009-11-24 Thread javaxmlsoapdev

I was able to configure /docs index separately from my db data index. 

still I am seeing same behavior where it only puts .docName & its size in
the "content" field (I have renamed field to "content" in this new schema) 

below are the only two fields I have in schema.xml 
 
 

Following is updated code from test case 

File fileToIndex = new File("file.txt"); 

ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract"); 
up.addFile(fileToIndex); 
up.setParam("literal.key", "8978"); 
up.setParam("literal.docName", "doc123.txt"); 
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); 
NamedList list = server.request(up); 
assertNotNull("Couldn't upload .txt",list); 

QueryResponse rsp = server.query( new SolrQuery( "*:*") ); 
assertEquals( 1, rsp.getResults().getNumFound() ); 
System.out.println(rsp.getResults().get(0).getFieldValue("content")); 

Also from solr admin UI when I search for "doc123.txt" then only it returns
me following response. not sure why its not indexing file's content into
"content" attribute. 
  
  
  
  702 
  text/plain 
  doc123.txt 
   
   
  8978 
   
   

Any idea? 

Thanks, 

-- 
View this message in context: 
http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26498946.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

2009-11-24 Thread javaxmlsoapdev

Following is luke response.  is empty. can someone
assist to find out why file content isn't being index?

   
 
 
  0 
  0 
  
 
  0 
  0 
  0 
  1259085661332 
  false 
  true 
  false 
  org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index
 
  2009-11-24T18:01:01Z 
  
   
 
 
  Indexed 
  Tokenized 
  Stored 
  Multivalued 
  TermVector Stored 
  Store Offset With TermVector 
  Store Position With TermVector 
  Omit Norms 
  Lazy 
  Binary 
  Compressed 
  Sort Missing First 
  Sort Missing Last 
  
  Document Frequency (df) is not updated when a document is
marked for deletion. df values include deleted documents. 
  
  

javaxmlsoapdev wrote:
> 
> I was able to configure /docs index separately from my db data index.
> 
> still I am seeing same behavior where it only puts .docName & its size in
> the "content" field (I have renamed field to "content" in this new schema)
> 
> below are the only two fields I have in schema.xml
>  required="true" /> 
>  multiValued="true"/>   
> 
> Following is updated code from test case
> 
> File fileToIndex = new File("file.txt");
> 
> ContentStreamUpdateRequest up = new
> ContentStreamUpdateRequest("/update/extract");
> up.addFile(fileToIndex);
> up.setParam("literal.key", "8978");
> up.setParam("literal.docName", "doc123.txt");
> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
> NamedList list = server.request(up);
> assertNotNull("Couldn't upload .txt",list);
>   
> QueryResponse rsp = server.query( new SolrQuery( "*:*") );
> assertEquals( 1, rsp.getResults().getNumFound() );
> System.out.println(rsp.getResults().get(0).getFieldValue("content"));
> 
> Also from solr admin UI when I search for "doc123.txt" then only it
> returns me following response. not sure why its not indexing file's
> content into "content" attribute.
> - 
> - 
> - 
>   702 
>   text/plain 
>   doc123.txt 
>
>   
>   8978 
>   
>   
> 
> Any idea?
> 
> Thanks,
> 
> 
> javaxmlsoapdev wrote:
>> 
>> http://machinename:port/solr/admin/luke gives me 404 error so seems like
>> its not able to find luke.
>> 
>> I am reusing schema, which is used for indexing other entity from
>> database, which has no relevance to documents. that was my next question
>> that what do I put in, in a schema if my documents don't need any column
>> mappings or anything. plus I want to keep file documents index separately
>> from database entity index. what's the best way to do this? If I don't
>> have any db columns etc to map and file documents index should leave
>> separate from db entity index, what's the best way to achieve this.
>> 
>> thanks,
>> 
>> 
>> 
>> Grant Ingersoll-6 wrote:
>>> 
>>> 
>>> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
>>> 
>>>> 
>>>> *:* returns me 1 count but when I search for specific word (which was
>>>> part of
>>>> .txt file I indexed before) it doesn't return me anything. I don't have
>>>> luke
>>>> setup on my end.
>>> 
>>> http://localhost:8983/solr/admin/luke should give yo some info.
>>> 
>>> 
>>>> let me see if I can set that up quickly but otherwise do
>>>> you see anything I am missing in solrconfig mapping or something?
>>> 
>>> What's your schema look like and how are you querying?
>>> 
>>>> which maps
>>>> document "content" to wrong attribute?
>>>> 
>>>> thanks,
>>>> 
>>>> Grant Ingersoll-6 wrote:
>>>>> 
>>>>> 
>>>>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>>>>> 
>>>>>> 
>>>>>> Following code is from my test case where it tries to index a file
>>>>>> (of
>>>>>> type
>>>>>> .txt)
>>>>>> ContentStreamUpdateRequest up = new
>>>>>> ContentStreamUpdateRequest("/update/extract");
>>>>>> up.addFile(fileToIndex);
>>>>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>>>>> up.setParam("ext.literal.docName", "doc123.txt");
>>>>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);   
>>>>>> server.reque

Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

2009-11-25 Thread javaxmlsoapdev

Grant, can you assist. I am going clueless as to why its not indexing content
of the file. I have provided schema, code info below/previous threads. do I
need to explicitly add param("content", "') into ContentStreamUpdateRequest?
which I don't think is the right thing to do. Please advie.

let me know if you need anything else. Appreciate your help.

Thanks,

javaxmlsoapdev wrote:
> 
> Following is luke response.  is empty. can someone
> assist to find out why file content isn't being index?
> 
>
>  
>  
>   0 
>   0 
>   
>  
>   0 
>   0 
>   0 
>   1259085661332 
>   false 
>   true 
>   false 
>name="directory">org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index
>  
>   2009-11-24T18:01:01Z 
>   
>
>  
>  
>   Indexed 
>   Tokenized 
>   Stored 
>   Multivalued 
>   TermVector Stored 
>   Store Offset With TermVector 
>   Store Position With TermVector 
>   Omit Norms 
>   Lazy 
>   Binary 
>   Compressed 
>   Sort Missing First 
>   Sort Missing Last 
>   
>   Document Frequency (df) is not updated when a document
> is marked for deletion. df values include deleted documents. 
>   
>   
> 
> javaxmlsoapdev wrote:
>> 
>> I was able to configure /docs index separately from my db data index.
>> 
>> still I am seeing same behavior where it only puts .docName & its size in
>> the "content" field (I have renamed field to "content" in this new
>> schema)
>> 
>> below are the only two fields I have in schema.xml
>> > required="true" /> 
>> > multiValued="true"/>  
>> 
>> Following is updated code from test case
>> 
>> File fileToIndex = new File("file.txt");
>> 
>> ContentStreamUpdateRequest up = new
>> ContentStreamUpdateRequest("/update/extract");
>> up.addFile(fileToIndex);
>> up.setParam("literal.key", "8978");
>> up.setParam("literal.docName", "doc123.txt");
>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>> NamedList list = server.request(up);
>> assertNotNull("Couldn't upload .txt",list);
>>  
>> QueryResponse rsp = server.query( new SolrQuery( "*:*") );
>> assertEquals( 1, rsp.getResults().getNumFound() );
>> System.out.println(rsp.getResults().get(0).getFieldValue("content"));
>> 
>> Also from solr admin UI when I search for "doc123.txt" then only it
>> returns me following response. not sure why its not indexing file's
>> content into "content" attribute.
>> - 
>> - 
>> - 
>>   702 
>>   text/plain 
>>   doc123.txt 
>>
>>   
>>   8978 
>>   
>>   
>> 
>> Any idea?
>> 
>> Thanks,
>> 
>> 
>> javaxmlsoapdev wrote:
>>> 
>>> http://machinename:port/solr/admin/luke gives me 404 error so seems like
>>> its not able to find luke.
>>> 
>>> I am reusing schema, which is used for indexing other entity from
>>> database, which has no relevance to documents. that was my next question
>>> that what do I put in, in a schema if my documents don't need any column
>>> mappings or anything. plus I want to keep file documents index
>>> separately from database entity index. what's the best way to do this?
>>> If I don't have any db columns etc to map and file documents index
>>> should leave separate from db entity index, what's the best way to
>>> achieve this.
>>> 
>>> thanks,
>>> 
>>> 
>>> 
>>> Grant Ingersoll-6 wrote:
>>>> 
>>>> 
>>>> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
>>>> 
>>>>> 
>>>>> *:* returns me 1 count but when I search for specific word (which was
>>>>> part of
>>>>> .txt file I indexed before) it doesn't return me anything. I don't
>>>>> have luke
>>>>> setup on my end.
>>>> 
>>>> http://localhost:8983/solr/admin/luke should give yo some info.
>>>> 
>>>> 
>>>>> let me see if I can set that up quickly but otherwise do
>>>>> you see anything I am missing in solrconfig mapping or something?
>>>> 
>>>> What's your schema look like and how are you querying?
>>>> 
>>>>> which maps
>

Where to put ExternalRequestHandler and Tika jars

2009-11-25 Thread javaxmlsoapdev

My SOLR_HOME =/home/solr_1_4_0/apache-solr-1.4.0/example/solr/conf in
tomcat.sh

POI, PDFBox, Tika and related jars are under
/home/solr_1_4_0/apache-solr-1.4.0/lib

When I try to index files using SolrJ API as follow, I don't see content of
the file being indexed. It only indexes file size (bytes) and file/type into
"content" field. See below schema defintion as well.
ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
up.addFile(file);
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
server.request(up);

schema.xml has following
  
  

content

And solrconfig.xml has


  content
  content

  

Luke response is as below, which displays correct count (7) of indexed
documents but no "content" in the index. in tomcat logs I don't see any
errors or anything. Unless I am going blind with something I don't see
anything missing in setting things up. Can anyone advise. Do I need to
include tika jars in tomcat's deployed solr/lib or unde /example/lib in
SOLR_HOME?

   
- 
- 
  0 
  28 
  
- 
  7 
  7 
  25 
  1259164190261 
  false 
  true 
  false 
  org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index
 
  2009-11-25T15:50:03Z 
  
- 
- 
  text 
  ITSM-- 
  ITS-- 
  7 
  18 
- 
  3 
  3 
  3 
  3 
  2 
  2 
  1 
  1 
  1 
  1 
  
- 
  12 
  2 
  4 
  
  
- 
  slong 
  I-SO-l 
  I-SO- 
  7 
  7 
- 
  1 
  1 
  1 
  1 
  1 
  1 
  1 
  
- 
  7 
  
  
  
- 
- 
  Indexed 
  Tokenized 
  Stored 
  Multivalued 
  TermVector Stored 
  Store Offset With TermVector 
  Store Position With TermVector 
  Omit Norms 
  Lazy 
  Binary 
  Compressed 
  Sort Missing First 
  Sort Missing Last 
  
  Document Frequency (df) is not updated when a document is
marked for deletion. df values include deleted documents. 
  
  
-- 
View this message in context: 
http://old.nabble.com/Where-to-put-ExternalRequestHandler-and-Tika-jars-tp26515579p26515579.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Where to put ExternalRequestHandler and Tika jars

2009-11-25 Thread javaxmlsoapdev

g. I had to include tika and related parsing jars into
tomcat/webapps/solr/WEB-INF/lib.. this was an embarrassing mistake.
apologies for all the noise. 

Thanks,
-- 
View this message in context: 
http://old.nabble.com/Where-to-put-ExternalRequestHandler-and-Tika-jars-tp26515579p26518100.html
Sent from the Solr - User mailing list archive at Nabble.com.



Batch file upload using solrJ API

2009-11-25 Thread javaxmlsoapdev

Is there an API to upload files over one connection versus looping through
all the files and creating new ContentStreamUpdateRequest for each file.
This, as expected, doesn't work if there are large number of files and
quickly run into memory problems. Please advise.

Thanks,


-- 
View this message in context: 
http://old.nabble.com/Batch-file-upload-using-solrJ-API-tp26518167p26518167.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Where to put ExternalRequestHandler and Tika jars

2009-11-30 Thread javaxmlsoapdev

Yes. code I posted in first thread does work. And I am able to retrieve data
from the document index. did you include all required jars in deployed solr
application's lib folder? what errors are you seeing?

Juan Pedro Danculovic wrote:
> 
> HI! does your example finally works? I index the data with solrj and I
> have
> the same problem and could not retrieve file data.
> 
> 
> On Wed, Nov 25, 2009 at 3:41 PM, javaxmlsoapdev  wrote:
> 
>>
>> g. I had to include tika and related parsing jars into
>> tomcat/webapps/solr/WEB-INF/lib.. this was an embarrassing mistake.
>> apologies for all the noise.
>>
>> Thanks,
>> --
>> View this message in context:
>> http://old.nabble.com/Where-to-put-ExternalRequestHandler-and-Tika-jars-tp26515579p26518100.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Where-to-put-ExternalRequestHandler-and-Tika-jars-tp26515579p26576242.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Batch file upload using solrJ API

2009-11-30 Thread javaxmlsoapdev

Any suggestion/pointers on this?

javaxmlsoapdev wrote:
> 
> Is there an API to upload files over one connection versus looping through
> all the files and creating new ContentStreamUpdateRequest for each file.
> This, as expected, doesn't work if there are large number of files and
> quickly run into memory problems. Please advise.
> 
> Thanks,
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Batch-file-upload-using-solrJ-API-tp26518167p26576268.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr plugin or something else for custom work?

2009-11-30 Thread javaxmlsoapdev

I have a requirement where I am indexing attachements. Attachements hang off
of a database entity(table). I also need to include some meta-data info from
the database table as part of the index. Trying to find best way to
implement using custom handler or something? where custom handler gets all
required db records (those include document path) by consuming a web service
(I can expose a method from my application as a web service) and then
itereate through a list (returned by web serivce) and index required meta
data along with indexing attachments (attachements path is part of meta data
of an entity). Has anyone tried something like this or have suggestions how
best to implement this requirement?
-- 
View this message in context: 
http://old.nabble.com/Solr-plugin-or-something-else-for-custom-work--tp26577014p26577014.html
Sent from the Solr - User mailing list archive at Nabble.com.



dismax query syntax to replace standard query

2009-12-03 Thread javaxmlsoapdev

I have configured dismax handler to search against both "title" &
"description" fields now I have some other attributes on the page e.g.
"status", "name" etc. On the search page I have three fields for user to
input search values

1)Free text search field (which searchs against both "title" &
"description")
2)Status (multi select dropdown)
3)name(single select dropdown)

I want to form query like textField1:value AND status:(Male OR Female) AND
name:"abc". I know first (textField1:value searchs against both "title" &
"description" as that's how I have configured dixmax in the configuration)
but not sure how I can AND other attributes (in my case "status" & "name")

note; standadquery looks like following (w/o using dixmax handler)
title:"test"description:"test"name:"Joe"statusName:(Male OR Female)
-- 
View this message in context: 
http://old.nabble.com/dismax-query-syntax-to-replace-standard-query-tp26631725p26631725.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: dismax query syntax to replace standard query

2009-12-04 Thread javaxmlsoapdev

Thanks. When I do it that way it gives me following query.

params={indent=on&start=0&q=risk+test&qt=dismax&fq=statusName:(Male+OR+Female)+name:"Joe"&hl=on&rows=10&version=2.2}
hits=63 status=0 QTime=54 

I typed in 'Risk test' (no quote in the text) in the text field in UI. I
want search to do AND between "statusName" and "name" attributes (all
attributes in fq param).  

Following is my dismax configuration in solrconfig.xml


 dismax
 explicit
 0.01
 
title^2 description
 
 
title description
 
 
2<-1 5<-2 6<90%
 
 100
 *:*
 title description
 10
 title
 regex

  

And schema.xml has
title
 -- when I change this to AND it does
AND all params in fq and also does ANDing between words in the text field
e.g. "risk+test" and doesn't return me results. 

basically I want to do ORing between words in "q" list and ANDing between
params in "fq" list.

Any pointers would be appreciated.

Thanks,


isugar wrote:
> 
> I believe you need to use the fq parameter with dismax (not to be confused
> with qf) to add a "filter query" in addition to the q parameter.
> 
> So your text search value goes in q parameter (which searches on the
> fields
> you configure) and the rest of the query goes in the fq.
> 
> Would that work?
> 
> On Thu, Dec 3, 2009 at 7:28 PM, javaxmlsoapdev  wrote:
> 
>>
>> I have configured dismax handler to search against both "title" &
>> "description" fields now I have some other attributes on the page e.g.
>> "status", "name" etc. On the search page I have three fields for user to
>> input search values
>>
>> 1)Free text search field (which searchs against both "title" &
>> "description")
>> 2)Status (multi select dropdown)
>> 3)name(single select dropdown)
>>
>> I want to form query like textField1:value AND status:(Male OR Female)
>> AND
>> name:"abc". I know first (textField1:value searchs against both "title" &
>> "description" as that's how I have configured dixmax in the
>> configuration)
>> but not sure how I can AND other attributes (in my case "status" &
>> "name")
>>
>> note; standadquery looks like following (w/o using dixmax handler)
>> title:"test"description:"test"name:"Joe"statusName:(Male OR Female)
>> --
>> View this message in context:
>> http://old.nabble.com/dismax-query-syntax-to-replace-standard-query-tp26631725p26631725.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/dismax-query-syntax-to-replace-standard-query-tp26631725p26635928.html
Sent from the Solr - User mailing list archive at Nabble.com.



how to set multiple fq while building a query in solrj

2009-12-04 Thread javaxmlsoapdev

how do I create a query string witih multiple fq params using solrj SolrQuery
API.

e.g. I want to build a query as follow

http://servername:port/solr/issues/select/?q=testing&fq=statusName:(Female
OR Male)&fq=name="Joe"

I am using solrj client APIs to build query and using SolrQuery as follow

solrQuery.setParam("fq" statusString);
solrQuery.setParam("fq", nameString);

It only sets last "fq" (fq=nameString)in the string.. If I swich abover
setParam order it sets fq=statusString. How do I set muliple fq params in
SolrQuery object.

Thanks,
-- 
View this message in context: 
http://old.nabble.com/how-to-set-multiple-fq-while-building-a-query-in-solrj-tp26638650p26638650.html
Sent from the Solr - User mailing list archive at Nabble.com.



store content only of documents

2009-12-15 Thread javaxmlsoapdev

I store document in a field "content" field defiend as follow in schema.xml


and following in solrconfig.xml


  content
  content

  

I want to store only "content" into this field but it store other meta data
of a document e.g. "Author", "timestamp", "document type" etc. how can I ask
solr to store only body of document into this field and not other meta data?

Thanks,

-- 
View this message in context: 
http://old.nabble.com/store-content-only-of-documents-tp26803101p26803101.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: store content only of documents

2009-12-17 Thread javaxmlsoapdev

Anyone?

javaxmlsoapdev wrote:
> 
> I store document in a field "content" field defiend as follow in
> schema.xml
>  multiValued="true"/>
> 
> and following in solrconfig.xml
>  class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
> 
>   content
>   content
> 
>   
> 
> I want to store only "content" into this field but it store other meta
> data of a document e.g. "Author", "timestamp", "document type" etc. how
> can I ask solr to store only body of document into this field and not
> other meta data?
> 
> Thanks,
> 
> 

-- 
View this message in context: 
http://old.nabble.com/store-content-only-of-documents-tp26803101p26834525.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Searching .msg files

2009-12-17 Thread javaxmlsoapdev


1)use tika to index .msg files (Tika does support Microsoft outlook format
and I am already using Tika: http://lucene.apache.org/tika/formats.html).
2)while indexing you'll have to write handler to extract To, CC, Bcc values
and store it in a separate field in index.
3)when user searches on .msg files, compare if s/he is in To, Cc, Bcc field
first before returning result to the page and filter results accordingly.



Abhishek Srivastava-2 wrote:
> 
> Hello Everyone,
> 
> In my company, we store a lot of old emails (.msg files) in a database
> (done
> for the purpose of legal compliance).
> 
> The users have been asking us to give search functionality on the old
> emails.
> 
> One of the primary requirement is that when people search, they should
> only
> be able to search in their own emails (emails in which they were in the
> to,
> cc or bcc list).
> 
> How can solr be used?
> 
> from what I know about this product is that it only searches xml
> content...
> so I will have to extract the body of the email and convert it to xml
> right?
> 
> How will I limit the search results to only those emails where the user
> who
> is searching was in the to, cc or bcc list?
> 
> Please do recommend me an approach for providing a solution to our
> requirement.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Searching-.msg-files-tp26788199p26835015.html
Sent from the Solr - User mailing list archive at Nabble.com.



Build index by consuming web service

2009-12-30 Thread javaxmlsoapdev

I am in a need of a handler which consumes web serivce and builds index from
return results of the service. Until now I was building index by reading
data directly from database query using DataImportHandler. 

There are new functional requirements to index calculated fields in the
index and allow search on them. I have exposed an application API as a web
service, which returns all attributes for indexing. How can ask Solr to
consume this service and index attributes returned by the service? 

Any pointers would be appreciated.

Thanks,


-- 
View this message in context: 
http://old.nabble.com/Build-index-by-consuming-web-service-tp26970642p26970642.html
Sent from the Solr - User mailing list archive at Nabble.com.



weird text stripping issue

2010-01-28 Thread javaxmlsoapdev

I am observing very weird text stripping issue. 

when I search for word "Search" I get following 

  Issue 18 Search String 
  4688 
  Issue 18 Search String2 
  

And highliting node

  
Issue 18 Search String2 
  
  
Issue 18 Search String 
  

My actual description string is "Issue 18 Search String2" # 2 isn't coming
up back in description attribute in my search results. note; both title &
description are the fields solr searches against. that's how my default
config is.

also note description is of type "clob" in my config as below.


however when I search on other attributes (excluding title & description)
then returning result brings back full description text including "2". 

Any idea what's wrong going on here?

Thanks,
-- 
View this message in context: 
http://old.nabble.com/weird-text-stripping-issue-tp27363086p27363086.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: weird text stripping issue

2010-01-28 Thread javaxmlsoapdev

Analyzers are default. anything in particular to look for?

ANKITBHATNAGAR wrote:
> 
> 
> Check you analyzers
> 
> Ankit
> 
> -Original Message-
> From: javaxmlsoapdev [mailto:vika...@yahoo.com] 
> Sent: Thursday, January 28, 2010 4:46 PM
> To: solr-user@lucene.apache.org
> Subject: weird text stripping issue
> 
> 
> I am observing very weird text stripping issue. 
> 
> when I search for word "Search" I get following 
> 
>   Issue 18 Search String 
>   4688 
>   Issue 18 Search String2 
>   
> 
> And highliting node
> 
>   
> Issue 18 Search String2 
>   
>   
> Issue 18 Search String 
>   
> 
> My actual description string is "Issue 18 Search String2" # 2 isn't coming
> up back in description attribute in my search results. note; both title &
> description are the fields solr searches against. that's how my default
> config is.
> 
> also note description is of type "clob" in my config as below.
> 
> 
> however when I search on other attributes (excluding title & description)
> then returning result brings back full description text including "2". 
> 
> Any idea what's wrong going on here?
> 
> Thanks,
> -- 
> View this message in context:
> http://old.nabble.com/weird-text-stripping-issue-tp27363086p27363086.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/weird-text-stripping-issue-tp27363086p27365621.html
Sent from the Solr - User mailing list archive at Nabble.com.



Any idea what could be wrong with this fq value?

2010-02-03 Thread javaxmlsoapdev

Following is my solr URL. 

http://hostname:port/solr/entities/select/?version=2.2&start=0&indent=on&qt=dismax&rows=60&fq=statusName:(Open
OR Cancelled)&debugQuery=true&q=dev&fq=groupName:"Infrastructure“ 

“groupName” is one of the attributes I create fq (filterQuery) on. This
field(groupName) is being indexed and stored. 

When I search for anything else other than “Infrastructure” in fq groupName
Solr brings me back correct results. When I pass in “Infrastructure” in the
fq=groupName:"Infrastructure“ it never brings me anything back. If I remove
“fq” completely it will bring me all results including records with
groupName:"Infrastructure“. Something is wrong only with this
“Infrastructure” value in the fq. 

Any idea what wrong could be happening. Clearly this is only related to
value "Infrastructure“ in the filter query.

Thanks,

-- 
View this message in context: 
http://old.nabble.com/Any-idea-what-could-be-wrong-with-this-fq-value--tp27437723p27437723.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Any idea what could be wrong with this fq value?

2010-02-03 Thread javaxmlsoapdev

thanks Erik for the pointer. I had this field as "text" and after changing it
to "string" it started working as expected. 

I am still not sure why this particular value("Infrastructure") was failing
to bring back results. other values like "Network", "Information" etc worked
fine when field was of type "text" as well.

I tried(when groupName was of type "text") 
&q=*:*&facet=on&facet.field=groupName and it brought back 
"Infrascture" correctly.

Can you explain internally how solr indexed this attribute differently and
changing to "string" from "text" started working?

Thanks,

Erik Hatcher-4 wrote:
> 
> is groupName a "string" field?  If not, it probably should be.  My  
> hunch is that you're analyzing that field and it is lowercased in the  
> index, and maybe even stemmed.
> 
> Try &q=*:*&facet=on&facet.field=groupName to see all the *indexed*  
> values of the groupName field.
> 
>   Erik
> 
> On Feb 3, 2010, at 10:05 AM, javaxmlsoapdev wrote:
> 
>>
>> Following is my solr URL.
>>
>> http://hostname:port/solr/entities/select/? 
>> version=2.2&start=0&indent=on&qt=dismax&rows=60&fq=statusName:(Open
>> OR Cancelled)&debugQuery=true&q=dev&fq=groupName:"Infrastructure“
>>
>> “groupName” is one of the attributes I create fq (filterQuery) on.  
>> This
>> field(groupName) is being indexed and stored.
>>
>> When I search for anything else other than “Infrastructure” in fq  
>> groupName
>> Solr brings me back correct results. When I pass in “Infrastructure”  
>> in the
>> fq=groupName:"Infrastructure“ it never brings me anything back. If I  
>> remove
>> “fq” completely it will bring me all results including records with
>> groupName:"Infrastructure“. Something is wrong only with this
>> “Infrastructure” value in the fq.
>>
>> Any idea what wrong could be happening. Clearly this is only related  
>> to
>> value "Infrastructure“ in the filter query.
>>
>> Thanks,
>>
>> -- 
>> View this message in context:
>> http://old.nabble.com/Any-idea-what-could-be-wrong-with-this-fq-value--tp27437723p27437723.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Any-idea-what-could-be-wrong-with-this-fq-value--tp27437723p27439279.html
Sent from the Solr - User mailing list archive at Nabble.com.



DataImportHandler can't understand query

2010-02-08 Thread javaxmlsoapdev

I have a complex query (runs fine in database), which I am trying to include
in DataImportHandler query. Query has case statements with < > in it 

e.g.

case when (ASSIGNED_TO < > '' and TRANSLATE(ASSIGNED_TO, '',
'0123456789')='')

DataImportHandler failes to understand query with following error and
complaining about "<" symbol. How to go about this? Note; query is valid and
runs fine in database.

[Fatal Error] :26:26: The value of attribute "query" associated with an
element type "entity" must not contain the '<' character.
Feb 8, 2010 6:02:09 PM org.apache.solr.handler.dataimport.DataImportHandler
inform
SEVERE: Exception while loading DataImporter
org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
occurred while initializing context
at
org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:190)

Thansk,
-- 
View this message in context: 
http://old.nabble.com/DataImportHandler-can%27t-understand-query-tp27507918p27507918.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImportHandler can't understand query

2010-02-08 Thread javaxmlsoapdev

Note I already tried to escape < character with \< but still it throws same
error.

Any idea?

Thanks,

javaxmlsoapdev wrote:
> 
> I have a complex query (runs fine in database), which I am trying to
> include in DataImportHandler query. Query has case statements with < > in
> it 
> 
> e.g.
> 
> case when (ASSIGNED_TO < > '' and TRANSLATE(ASSIGNED_TO, '',
> '0123456789')='')
> 
> DataImportHandler failes to understand query with following error and
> complaining about "<" symbol. How to go about this? Note; query is valid
> and runs fine in database.
> 
> [Fatal Error] :26:26: The value of attribute "query" associated with an
> element type "entity" must not contain the '<' character.
> Feb 8, 2010 6:02:09 PM
> org.apache.solr.handler.dataimport.DataImportHandler inform
> SEVERE: Exception while loading DataImporter
> org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
> occurred while initializing context
>   at
> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:190)
> 
> Thansk,
> 

-- 
View this message in context: 
http://old.nabble.com/DataImportHandler-can%27t-understand-query-tp27507918p27508214.html
Sent from the Solr - User mailing list archive at Nabble.com.



Good literature on search basics

2010-02-12 Thread javaxmlsoapdev

Does anyone know good literature(web resources, books etc) on basics of
search? I do have Solr 1.4 and Lucene books but wanted to go in more details
on basics. 

Thanks,
-- 
View this message in context: 
http://old.nabble.com/Good-literature-on-search-basics-tp27562021p27562021.html
Sent from the Solr - User mailing list archive at Nabble.com.



HttpDataSource consume REST API with Authentication required

2010-03-04 Thread javaxmlsoapdev

I have to use 
http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource
HttpDataSource  to ask Solr consume my REST service and index data returned
from that service. My application/service has authentication/authorization.
When Solr invokes this service it MUST have valid credentials and stuff.
How/where do I configure/write authentication part before Solr consumes my
REST service? 

Any pointers would be appreciated.

Thanks,

-- 
View this message in context: 
http://old.nabble.com/HttpDataSource-consume-REST-API-with-Authentication-required-tp27785340p27785340.html
Sent from the Solr - User mailing list archive at Nabble.com.