I'm ecstatic to be able assist a little bit with the Social Capital Markets 2009 conference, particularly in community building. One of the tasks I'm working on involves cleaning up lists of people who are Tweeting.
If you've got a tool that does this already, good god, please let me know! But if you're like me and after some fruitless searching decided that misusing excel is always fun, here's an algorithm to follow:
- Search! (In my case, I search for #socap09: http://search.twitter.com/search?q=%23socap09 )
- Copy and paste the results into an excel spreadsheet. It will look dreadful, but don't worry.
- Sort on Column A
- On the DATA menu, choose "Text to Column...", choose to use delimiters, and then choose "space" and "other", where other is a colon (:)
- Sort on Column A
- There will be some number with a blank first cell. Select those blank cells (not the whole column!) On the EDIT menu, choose "Delete," and remove this first cell and choose "shift left."
- Sort on Column A
- Delete everything except column A. This will leave you with a list of TwitterIDs plus some dreck.
- Delete those things that are obviously not TwitterIDs, such as *, # and actual numbers.
- Again, choose text to column, using the delimiter "other" and chosing the underscore ( _ )
- You've now broken apart the picture file names into separate columns. TwitterIDs only need one column (usually, exception below). Sort on Column B
- The top results will almost all be file names for pictures, and the last word in the last column wll be "normal". The exception will be when someone uses an underscore in their TwitterID (such as the_real_shaq). There typically aren't many of those people compared to people with pictures; so scan and find those. Put the correct TwitterID in column A. (NOTE: finding those TwitterIDs with an underscore is the only manual process. If you can think of a way to do that in an automated way on excel, let me know.)
- Sort on Column B
- Remove everything with something in Column B (because you've rescued the real TwitterIDs)
- Go to the DATA menu and select PivotTable Report... and run this report. On the last step of the wizard, select "Layout," and drag whatever is available (should be a Twitter ID, in cell A1) to the side column of the pivot table and again into the center field of the report. Finish the pivot report.
- Copy the left column of the pivot table and from the EDIT menu, choose "Paste Special..." and select values.
Voila - a clean list.
Comments