;; This buffer is for notes
_babel
Java exceptions, server down, and currently this
{
"items" : [
{
"TDATE" : "0:00:00",
"MOD" : "D",
"STATION" : [
"EWTN (WEWN)",
"WEWN",
"WEWN EWTN Catholic R.",
"Radio Free Asia",
"CNR1 Jammer",
"IBB",
"R.FARDA",
"Radio Farda"
],
(wong as theres only STATION per row, suppose its open source and i could
install a Babel locally and try to figure it out)
_google-refine
latest snapshot, unreported parse errors, visible as entire lines or even the
rest of the document appearing in single facet fieldnames..
wrote a TSV parser that works on the
xls2txt(http://wizard.ae.krakow.pl/%7Ejb/xls2txt/) output of a XLS file from
hfskeds(http://www.hfskeds.com/skeds/)
def csv
open(node).readlines.map{|l|l.chomp.split(/,/)}.do{|t|
t[0].do{|x|
t[1..-1].each_with_index{|r,ow|r.each_with_index{|v,i|
yield '#r'+ow.to_s,x[i],v
}}}}
end
this is turned into an inmemory RDF/JSON graph,
# fromStream :: Graph -> tripleSource -> Graph
def fromStream m,*i
send(*i) do |s,p,o|
m[s] ||= {'uri'=>s}
m[s][p] ||= []
m[s][p].push o
end; m
end
and finalyl to Exhibit JSON via
fn Render+'application/json+exhibit',->d,e{
fields=e.q['f'].do{|f|f.split /,/}
{items: d.values.map{|r|
r.keys.-(['uri']).map{|k|
f=k.frag.do{|f|(f.gsub /\W/,'').downcase} # alphanumeric id restriction
if !fields || (fields.member? f)
r[f]=r[k][0].to_s # rename fieldnames, unwrap value
r.delete k unless f==k # cleanup unless id same as before
else
r.delete k
end}
r[:label]=r.delete 'uri' # requires label only
r
}}.to_json}
the reason we massage the fieldnames is elucidated in this message
http://www.mail-archive.com/[email protected]/msg01052.html
all of this is integrated into http://gitorious.org/element , drop a .tsv file
in a directory ,add ?view=exhibit to querystring , get an exhibit
brought me to the next problem, browser freezing up for 90 seconds as Exhibit
did something - DOM generation and facet statistics i guess
so i forget exactly what happened next but was already using dynamic
stylesheets in a mail app (each replied-to line wrapped in class=quote , and
span.quote {display:none} added to document to hide. it was pretty obvious this
would be faster than
document.getElementsByClassName('quote').forEach(function(){this.hide})
decided to take same approach to faceted filtering in browser, i have no idea
if my choices r the fastest but they work and will probably do further
experiments (eg, situating common facet values as innermost or outermost ala
the SPARQL trick of using the smallest pattern first)
changing qs view=exhibit -> view=e
if a= isnt specified (comma-seperated list of predicate URIs) you are presented
with a list, like:
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://rdfs.org/sioc/ns#addressed_to
http://rdfs.org/sioc/ns#has_creator
http://purl.org/rss/1.0/category
[Go]
click the ones you want, [Go]
at which point, left side is filled with facet-selector panes
custom views are selected with ev=board
a convention of view/board/base
view/board/item
where base is handed a function that it calls to put the items wrapped in
special divs that the CSS will use to filter
a music player, /item draws a single playlist row:
http://blog.whats-your.name/public/smiths.png
figuring out result set is only half the battle for browser, excessive use of
floats, relative sizes and so on become noticeable in huge data sets
hfskeds is 30K rows, 22 cols or .66 million triples. roughly the upper bounds
of what i'd want to use, on a Netbook. takes about 5 seconds to load a doc and
0.8 second to redraw after filter change
can squeeze out faster redraw
<pre>, fixed-heights/widths, absolute positioning
shortwave schedules were main dataset so lets get into some of those
http://blog.whats-your.name/public/25m.html
#!/bin/sh
curl
'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=2200&minP=kc/s&maxP=kc/s&max=2500'
> 120m.html
curl
'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=3100&minP=kc/s&maxP=kc/s&max=3450'
> 90m.html
curl
'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=3890&minP=kc/s&maxP=kc/s&max=4000'
> 75m.html
curl
'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=4740&minP=kc/s&maxP=kc/s&max=5125'
> 60m.html
curl
'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=5800&minP=kc/s&maxP=kc/s&max=6300'
> 49m.html
curl
'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=7200&minP=kc/s&maxP=kc/s&max=7600'
> 40m.html
curl
'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=9400&minP=kc/s&maxP=kc/s&max=9999'
> 31m.html
curl
'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=11500&minP=kc/s&maxP=kc/s&max=12160'
> 25m.html
curl
'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=13500&minP=kc/s&maxP=kc/s&max=13900'
> 22m.html
curl
'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=15100&minP=kc/s&maxP=kc/s&max=15900'
> 19m.html
curl
'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=17500&minP=kc/s&maxP=kc/s&max=17900'
> 16m.html
created a HTML file for each band and uploaded to webserver..
as you can see a default filter exists, maxP, minP (matchP too) which is handy
for common uses
custom filters to be activated via QS (comma-seperated list) can be written, eg
exerpt
sort of a natural-language one, realizing any time an int < 2400 in email is
probably referring to a time, and >2400 to frequency (minus a few false
positives for phone numbers, years)
m[u]={'uri' => u,
'big'=>l.scan(/\b[A-Z][A-Z][A-Z]+\b/),
Content=>l}
l.scan(/\d{4,}/){|d| d=d.to_i
if (d > 2400) && (d < 30000)
m[u]['kc/s']=[d]
elsif
m[u]['BTIM']=[d];m[u]['ETIM']=[d+30]
end}
m.delete u unless m[u].has_keys ['BTIM','kc/s']
)}
filter mutates the request-time JSON model however sees fit, adding new
properties and so on..
http://blog.whats-your.name/public/GlenDoes31.html
i did a few more of these, Eibi L and H:
http://blog.whats-your.name/public/eibiL.html (this is the largest one up now,
data-wise)
http://blog.whats-your.name/public/bbc.html BBC
onto some other examples
/t is a lifestream (http://www.cs.yale.edu/homes/freeman/dissertation/etf.pdf)
serving a time-range of resource (with options for start/end direction
(Ascending/descending) and count) here filtered by source
http://i574.photobucket.com/albums/ss187/ix9/hyper/2011-01-16-203039_1366x768_scrot.png
always add a sioc:addressed_to and sioc:creator to triple-izers for this usage
/search examine shows us top poster is Cory Doctorow (no surprise there)
http://i574.photobucket.com/albums/ss187/ix9/hyper/to.png
i imported all boingboing posts for this one, thats discussed @
http://blog.whats-your.name/public/bb.html
a couple possibilities
hash URIs for filters. i will wait for Exhibit 3.0 to come up with their
convention and use that. or just soemthing like facet=val,val2&facet2=val3,val4
visible set - jQuery has a :visible meta-selector, which i have not tried to
see how fast it is. would be useful if you want to reserialize a document
deleting all invisible (filtered) elements.. probably we should make noise
about adding right to css as it likely has feature already eg Ctrl-F only
searches visible els
"just publish RDFa" would be cool, some JS that introspects a DOM and adds the
appropriate facet wrappers
-c