Indexing RestFul Json Service

2013-12-13 Thread Pritesh Patel
Does anybody know how to index Json Data coming from a RestFul service?  I
see I can use the DIH to index xml data, but what about Json?

--Pritesh


Dataimport handler Date

2014-03-06 Thread Pritesh Patel
I'm using the dataimporthandler to index data from a mysql DB.  Been
running it just fine. I've been using full-imports. I'm now trying
implement the delta import functionality.

To implement the delta query, you need to be reading the last_index_time
from a properties file to know what new to index.  So I'm using the
parameter:
{dataimporter.last_index_time} within my query.

The problem is when I use this, the date always is : "Thu Jan 01 00:00:00
UTC 1970".  It's never actually reading the correct date stored in the
dataimport.properties file.

So my delta query does not work.  Has anybody see this issue?

Seems like its always using the beginning date for epoch or unix timestamp
code 0.

--Pritesh

P.S.  If you want to see the delta query, see below.

deltaQuery="SELECT node.nid from node where node.type = 'news' and
node.status = 1 and (node.changed >
UNIX_TIMESTAMP('${dataimporter.last_index_time}'jgkg) or node.created >
UNIX_TIMESTAMP('${dataimporter.last_index_time}'))"

deltaImportQuery="SELECT node.nid, node.vid, node.type, node.language,
node.title, node.uid, node.status,
FROM_UNIXTIME(node.created,'%Y-%m-%dT%TZ') as created,
FROM_UNIXTIME(node.changed,'%Y-%m-%dT%TZ') as changed, node.comment,
node.promote, node.moderate, node.sticky, node.tnid, node.translate,
content_type_news.field_image_credit_value,
content_type_news.field_image_caption_value,
content_type_news.field_subhead_value,
content_type_news.field_author_value,
content_type_news.field_dateline_value,
content_type_news.field_article_image_fid,
content_type_news.field_article_image_list,
content_type_news.field_article_image_data,
content_type_news.field_news_blurb_value,
content_type_news.field_news_blurb_format,
content_type_news.field_news_syndicate_value,
content_type_news.field_news_video_reference_nid,
content_type_news.field_news_inline_location_value,
content_type_news.field_article_contributor_nid,
content_type_news.field_news_title_value, page_title.page_title FROM node
LEFT JOIN content_type_news ON node.nid = content_type_news.nid LEFT JOIN
page_title ON node.nid = page_title.id where node.type = 'news' and
node.status = 1 and node.nid = '${deltaimport.delta.nid}'"


solr-map-reduce API

2014-10-29 Thread Pritesh Patel
What exactly does this API do?

--Pritesh


Internals of Analysis and Token Matching

2014-11-17 Thread Pritesh Patel
Hi Community.

Hoping someone can help explain this ...

Once all the analysis is done on a field all the tokens to identify that
field are stored.  What else is affecting a match to the document beyond a
simple token match and frequency of terms that match?

All the searches I did produce the same tokens (verified by using the
analysis screen in the admin, and looking at the terms indexed in solr
through the schema browser for field).  But some match and some don't when
I actually do the search.  I don't know why some of the searches don't
match even though everything in the analysis tells me they have the same
tokens.  What am I missing?

*Descriptions*

*Indexed in a field*: "4048860461"

*Searches that Match*
"4048860461"
"(404)8860461"

*Searches that don't match*
"404-886-0461"
"404)8860461"
"404)886)0461"

*Field analysis*
Field analysis is pretty simple, just used the "text_en_splitting_tight"
field but added an "ngram" filter to it.  See below.

  <
tokenizer class="solr.WhitespaceTokenizerFactory"/>