Hey,

I think you might be over-thinking this. Tweets are structured. You have the content (tweet), the user who tweeted it and various other meta data. So your 'document', might look like this:

<add>
<doc>
<field name="tweetId">ABCD1234</field>
<field name="tweet">I bought some apples</field>
<field name="user">JohnnyBoy</field>
</doc>
</add>

To get this structure, you can use any programming language your comfortable with and load it into Solr via various means. Obviously you can add more 'meta' fields that you get from twitter if you want as well.

David

On 28/05/2012 9:37 PM, Giovanni Gherdovich wrote:
Hi all.

I am in the process of setting up Solr for my application,
which is full text search on a bunch of tweets from twitter.

I am afraid I am missing something.
 From the books I am reading, "Apache Solr 3 Enterprise Search Server",
it looks like Solr works with structured input, like XML or CVS,
while I have the most wild and unstructured input ever (tweets).
A section named "Indexing documents with Solr Cell" seems to address my problem,
but also shows that before getting to Solr, I might need to use
another Apache tool called Tika.

Can anybody provide a brief explaination about the general picture?
Can I index my tweets with Solr?
Or do I need to put also Tika in my pipeline?

Best regards,
Giovanni Gherdovich

Reply via email to