Re: indexing unstructured text (tweets)

Giovanni Gherdovich Mon, 28 May 2012 07:35:47 -0700

Hello Jack and Anuj,

2012/5/28 Jack Krupansky <[email protected]>:
> The Twitter API extracts hash tag and user mentions for you, in addition to
> giving you the full raw text. You'll have to read up on the Twitter API.


That's what I thought just after hittind "send" on the message above ;-)
I am pretty sure the Twitter API format maps very nicely to a suitable
input format for Solr, if not even being already good for direct
feeding into Solr.

I am a bit unlucky here because I have been provided with
only the raw text for about 1.5 million tweets; so I would have
to write a few lines of code to restore at least user mentions,
hashtags and URLs.


2012/5/28 Anuj Kumar <[email protected]>:
> This is a bit old but provides good information for schema design-
> http://www.readwriteweb.com/archives/this_is_what_a_tweet_looks_like.php
>
> Found this link as well- https://gist.github.com/702360
>
> The types of the field may depend on the search requirements.

Anuj you provide very interesting links here, thanks,
even tho those kind of specifics might be already present
in the twitter API doc.
After I'll be done with my first Solr setup, I might
setup the whole pipeline (getting the Twitter feeds myself)
on my machines, so that I can exploit the whole
information content provided by Twitter.

Cheers,
Giovanni

Re: indexing unstructured text (tweets)

Reply via email to