svn blame not working for files which had binary mime-type in a previous revision

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

svn blame not working for files which had binary mime-type in a previous revision

Ferenc Kovacs
Hi,

I would like to bring up an old issue here:
I've just bumped into this, and first it was really confusing why would svn blame work with one xml file, but not with the other, as the 
Skipping binary file: 'reference/apc/book.xml'
wasn't really helpful.
I understand that the discussion back in 2004 didn't reached an agreement about how to resolve the issue, but could we at least change the error message to something like:
Skipping binary file: 'reference/apc/book.xml' as binary mime-type was found in revision X, use --force to proceed.

This would at least lessen the confusion on the users part.

ps: please cc me to the replies as I'm not subscribed to the list.

--
Ferenc Kovács
@Tyr43l - http://tyrael.hu
Reply | Threaded
Open this post in threaded view
|

Re: svn blame not working for files which had binary mime-type in a previous revision

Stefan Sperling
On Thu, Jan 31, 2013 at 01:57:16PM +0100, Ferenc Kovacs wrote:

> Hi,
>
> I would like to bring up an old issue here:
> http://subversion.tigris.org/issues/show_bug.cgi?id=2089
> I've just bumped into this, and first it was really confusing why would svn
> blame work with one xml file, but not with the other, as the
> Skipping binary file: 'reference/apc/book.xml'
> wasn't really helpful.
> I understand that the discussion back in 2004 didn't reached an agreement
> about how to resolve the issue, but could we at least change the error
> message to something like:
> Skipping binary file: 'reference/apc/book.xml' as binary mime-type was
> found in revision X, use --force to proceed.
>
> This would at least lessen the confusion on the users part.
>
> ps: please cc me to the replies as I'm not subscribed to the list.

Sure, this is a small but useful improvement, and also serves as
a reference to star wars (ep. 4 to 6 only of course :)

I've made the message say the follwing (in http://svn.apache.org/r1441017):

  Skipping binary file (use --force to treat as text): 'foo/bar.xml'

Are you happy enough with this change being shipped in Subversion 1.8
when it is released? I don't think this qualifies as a bug fix we must
backport to the 1.7 series. This change affects translations, which are
rarely updated on release branches and I'd rather avoid breaking them.
Reply | Threaded
Open this post in threaded view
|

Re: svn blame not working for files which had binary mime-type in a previous revision

Ferenc Kovacs



On Thu, Jan 31, 2013 at 4:30 PM, Stefan Sperling <[hidden email]> wrote:
On Thu, Jan 31, 2013 at 01:57:16PM +0100, Ferenc Kovacs wrote:
> Hi,
>
> I would like to bring up an old issue here:
> http://subversion.tigris.org/issues/show_bug.cgi?id=2089
> I've just bumped into this, and first it was really confusing why would svn
> blame work with one xml file, but not with the other, as the
> Skipping binary file: 'reference/apc/book.xml'
> wasn't really helpful.
> I understand that the discussion back in 2004 didn't reached an agreement
> about how to resolve the issue, but could we at least change the error
> message to something like:
> Skipping binary file: 'reference/apc/book.xml' as binary mime-type was
> found in revision X, use --force to proceed.
>
> This would at least lessen the confusion on the users part.
>
> ps: please cc me to the replies as I'm not subscribed to the list.

Sure, this is a small but useful improvement, and also serves as
a reference to star wars (ep. 4 to 6 only of course :)

I've made the message say the follwing (in http://svn.apache.org/r1441017):

  Skipping binary file (use --force to treat as text): 'foo/bar.xml'

Are you happy enough with this change being shipped in Subversion 1.8
when it is released? I don't think this qualifies as a bug fix we must
backport to the 1.7 series. This change affects translations, which are
rarely updated on release branches and I'd rather avoid breaking them.

Hi Stefan,

I would be ok with shipping it in 1.8, and while I think that the suggested wording is better than the current as it suggest a solution, but personally I would be still confused why would subversion treat my file as binary while by all obvious means it is a text file.
It is really unintuitive to guess that blame doesn't work because the file HAD a binary mimetype in the past (and from a quick test it seems that the blame will be skipped even if the versions shown by blame all happened after the binary mime-type was removed).
So if possible I would prefer telling the user why is that happening.

--
Ferenc Kovács
@Tyr43l - http://tyrael.hu
Reply | Threaded
Open this post in threaded view
|

Re: svn blame not working for files which had binary mime-type in a previous revision

Stefan Sperling
On Thu, Jan 31, 2013 at 04:44:39PM +0100, Ferenc Kovacs wrote:
> I would be ok with shipping it in 1.8, and while I think that the suggested
> wording is better than the current as it suggest a solution, but personally
> I would be still confused why would subversion treat my file as binary
> while by all obvious means it is a text file.

It's not obvious by all means that is a text file.

At any revision, the entire file content could be swapped out and binary
content be replaced with text and vice versa. In which case you'd set the
property in some revisions and not in others. It may make sense to
use 'svn delete' and 'svn add' in this case to make the replacement
explicit, but that's not a hard requirement.

The fact that someone told Subversion in the past that the file was binary
(by committing an svn:mime-type property with a non-text value) is part
of the history record and needs to be preserved in repository history.

If setting the property was a mistake, the --force option can be used
as a workaround. Or you can rewrite the repository history (e.g. with
svndumptool: http://svn.borg.ch/svndumptool/) to eradicate the mistake
from repository history.

I don't think there is anything generally wrong with Subversion's
behaviour here. It is being conservative in its assumptions about
what to do with data you give it for safekeeping, and it does what
it's being told to do (i.e. it is working as designed -- you may
disagree with this design but the current implementation reflects
the intended design).

> It is really unintuitive to guess that blame doesn't work because the file
> HAD a binary mimetype in the past (and from a quick test it seems that the
> blame will be skipped even if the versions shown by blame all happened
> after the binary mime-type was removed).

This probably happens because Subversion needs to evaluate the entire
operative revision range of the blame operation for changes, even
if a revision in the range doesn't actually contribute changes to
the file (or contributes changes that are reversed by later revisions).

If you blame a revision range where none of the versions of the file
has a binary svn:mime-type property set, but you still get the error,
then I agree that this smells like an issue worth investigating. Is this
the case? If so, can you provide a reproduction script for this problem?

> So if possible I would prefer telling the user why is that happening.

What exactly do you mean here?

If you mean that the error message should explain how binary files work,
I disagree. Users should be already informed about how Subversion handles
binary files when they run 'svn blame'. If users don't understand this
they should learn more about Subversion (e.g. by reading the documentation)
to make efficient use of it. I don't think having error messages substitute
for documentation is a good idea in general.
Reply | Threaded
Open this post in threaded view
|

Re: svn blame not working for files which had binary mime-type in a previous revision

Julian Foad
Stefan Sperling wrote:

> Ferenc Kovacs wrote:
>> It is really unintuitive to guess that blame doesn't work because the file
>> HAD a binary mimetype in the past (and from a quick test it seems that the
>> blame will be skipped even if the versions shown by blame all happened
>> after the binary mime-type was removed).
>
> This probably happens because Subversion needs to evaluate the entire
> operative revision range of the blame operation for changes, even
> if a revision in the range doesn't actually contribute changes to
> the file (or contributes changes that are reversed by later revisions).
>
> If you blame a revision range where none of the versions of the file
> has a binary svn:mime-type property set, but you still get the error,
> then I agree that this smells like an issue worth investigating. Is this
> the case? If so, can you provide a reproduction script for this problem?
>
>> So if possible I would prefer telling the user why is that happening.
>
> What exactly do you mean here?
>
> If you mean that the error message should explain how binary files work,
> I disagree. Users should be already informed about how Subversion handles
> binary files when they run 'svn blame'. If users don't understand this
> they should learn more about Subversion (e.g. by reading the documentation)
> to make efficient use of it. I don't think having error messages substitute
> for documentation is a good idea in general.

I agree with Ferenc's point that the reason for the error is obscure, and his request that the error message should indicate that the 'is binary' indication may come from some past revision.

- Julian

Reply | Threaded
Open this post in threaded view
|

Re: svn blame not working for files which had binary mime-type in a previous revision

Ferenc Kovacs
In reply to this post by Stefan Sperling



On Thu, Jan 31, 2013 at 5:17 PM, Stefan Sperling <[hidden email]> wrote:
On Thu, Jan 31, 2013 at 04:44:39PM +0100, Ferenc Kovacs wrote:
> I would be ok with shipping it in 1.8, and while I think that the suggested
> wording is better than the current as it suggest a solution, but personally
> I would be still confused why would subversion treat my file as binary
> while by all obvious means it is a text file.

It's not obvious by all means that is a text file.

It is obvious to the user: he opens the file, it is a textfile, he does a 
file text.xml
it will print out
test.xml: XML  document text
He remembers that there is an svn:mime-type property, but nope, not set or set to text/something.
I don't really want to resurrect the old discussion where this was discussed in length, but from the user POV, this problem can occur when by all means, that is a text file.
 

At any revision, the entire file content could be swapped out and binary
content be replaced with text and vice versa. In which case you'd set the
property in some revisions and not in others. It may make sense to
use 'svn delete' and 'svn add' in this case to make the replacement
explicit, but that's not a hard requirement.

Yeah, this was brought up in the previous thread, there were examples like delphi project files and changing a gif to png and still keeping the history of the file.
That is all true, but doesn't really change the fact that this behavior won't be obvious to the user and there is no indication why does this happen.
Personally I agree with the people who said that the current behavior is too conservative:
The worst thing that can happen is that you get the same thing as you would get if you svn blame-ed a binary file which never had the correct mime-type set:
you would got a bunch of binary stuff printed on your window, the same thing that you would get if you cat-ed a binary file by mistake.
The worst thing that can be caused by that can be fixed by typing reset in your terminal AFAIK.
But as I said before, and don't really want to take that on my shoulder to get it changed.
 

The fact that someone told Subversion in the past that the file was binary
(by committing an svn:mime-type property with a non-text value) is part
of the history record and needs to be preserved in repository history.

Nobody talked about changing the history.
 

If setting the property was a mistake, the --force option can be used
as a workaround.

Yeah, I mentioned that in my previous mail as far as I can tell.
We are talking about here 
 
Or you can rewrite the repository history (e.g. with
svndumptool: http://svn.borg.ch/svndumptool/) to eradicate the mistake
from repository history.

I don't want to change the history, and a few sentences ago it seemed that you think that keeping it is important.
 

I don't think there is anything generally wrong with Subversion's
behaviour here.

I think that it is a little bit hard to defend the current conservative behavior, as it doesn't really prevent you from accidentally blaming a binary file, but it can prevent you from blaming a text file (even if it was a text file all along).
But again: I doesn't want to change that behavior, just add a hint so that the user know why does it happen and how can he overcome that.
Your suggested fix tell the user how to proceed but still not help him/her to understand why does he need --force for some files when from all he/she can understand it works for the exact same file without --force. 
 
It is being conservative in its assumptions about
what to do with data you give it for safekeeping, and it does what
it's being told to do (i.e. it is working as designed -- you may
disagree with this design but the current implementation reflects
the intended design).

see above
 

> It is really unintuitive to guess that blame doesn't work because the file
> HAD a binary mimetype in the past (and from a quick test it seems that the
> blame will be skipped even if the versions shown by blame all happened
> after the binary mime-type was removed).

This probably happens because Subversion needs to evaluate the entire
operative revision range of the blame operation for changes, even
if a revision in the range doesn't actually contribute changes to
the file (or contributes changes that are reversed by later revisions).

If you blame a revision range where none of the versions of the file
has a binary svn:mime-type property set, but you still get the error,
then I agree that this smells like an issue worth investigating. Is this
the case? If so, can you provide a reproduction script for this problem?

- added a file with one line of content, commited, 
- set the mime-type to application/xml, commited, 
- changed the line, commited, 
- removed the mime-type, commited,
- changed the only line, commited.
- tried to svn blame: Skipping binary file: 'test.xml'
- svn blame --force shows me a single line of content from the last revision


> So if possible I would prefer telling the user why is that happening.

What exactly do you mean here?

see above
 

If you mean that the error message should explain how binary files work,
I disagree. Users should be already informed about how Subversion handles
binary files when they run 'svn blame'. If users don't understand this
they should learn more about Subversion (e.g. by reading the documentation)
to make efficient use of it. I don't think having error messages substitute
for documentation is a good idea in general.

Could you show me a single line of documentation where it is mentioned that svn blame will skip the file if it ever had a binary mime-type in the history of that file?
I couldn't find that.

--
Ferenc Kovács
@Tyr43l - http://tyrael.hu
Reply | Threaded
Open this post in threaded view
|

Re: svn blame not working for files which had binary mime-type in a previous revision

Stefan Sperling
On Thu, Jan 31, 2013 at 05:51:53PM +0100, Ferenc Kovacs wrote:
> On Thu, Jan 31, 2013 at 5:17 PM, Stefan Sperling <[hidden email]> wrote:
> > At any revision, the entire file content could be swapped out and binary
> > content be replaced with text and vice versa. In which case you'd set the
> > property in some revisions and not in others. It may make sense to
> > use 'svn delete' and 'svn add' in this case to make the replacement
> > explicit, but that's not a hard requirement.

> That is all true, but doesn't really change the fact that this behavior
> won't be obvious to the user and there is no indication why does this
> happen.

OK, I agree that it might not be obvious to someone who doesn't know
how blame actually works internally. It works by incrementally diffing
all revisions that changed the file to figure out which revision
contributed which line. Since a binary file doesn't have a notion
of what a 'line' really is, this approach doesn't work for binary
files. Neither does it work if one or more revisions contain binary
content. (I suppose it could also skip such revisions, but it's not
trivial to know whether such behaviour would be useful in the general
case.)

> I don't want to change the history, and a few sentences ago it seemed that
> you think that keeping it is important.

Some people want to change history, some don't. I didn't mean to
suggest that you'd have to do so. I pointed this out as another
possible workaround to consider.

> But again: I doesn't want to change that behavior, just add a hint so that
> the user know why does it happen and how can he overcome that.

So to be clear: You propose to make the error message point out that
Subversion considers a file binary because of certain svn:mime-type values?

I have an alternate proposal (see below).

> Your suggested fix tell the user how to proceed but still not help him/her
> to understand why does he need --force for some files when from all he/she
> can understand it works for the exact same file without --force.

Yes, the error message doesn't explain the underlying reason for the
problem. Neither do generic errors such as "file not found". I'm not
trying to give a silly example by mentioning "file not found", but
I'm trying to convey my point. You'll have some notion of possible
reasons why a file could be missing if you're accustomed to using a
file system on a computer. Same goes for people who are told 'this
file is binary' and understand how Subversion treats binary files.

> - added a file with one line of content, commited,
> - set the mime-type to application/xml, commited,

Here, the file is marked as binary. Subversion doesn't care whether
the content actually is binary but trusts the user's judgement on this.

I suppose we should improve Subversion's behaviour here by issuing a
warning if Subversion's own binary-file detection code doesn't identify
the file as binary when the user sets a binary mime-type.

What do you think about that?

svn: warning: 'application/xml' is a binary mime-type but file '%s' looks like text; diff, merge, blame, and other operations will stop working on this file

This way, we point out the cause of the problem when it first appears,
rather than later.

> - changed the line, commited,
> - removed the mime-type, commited,
> - changed the only line, commited.
> - tried to svn blame: Skipping binary file: 'test.xml'
> - svn blame --force shows me a single line of content from the last revision
 
> Could you show me a single line of documentation where it is mentioned that
> svn blame will skip the file if it ever had a binary mime-type in the
> history of that file?
> I couldn't find that.

It's in the FAQ, see http://subversion.apache.org/faq.html#binary-files

The book does indeed not seem to explicitly mention 'svn blame' in this
context. I think that should be fixed.
Reply | Threaded
Open this post in threaded view
|

Re: svn blame not working for files which had binary mime-type in a previous revision

Stefan Sperling
On Thu, Jan 31, 2013 at 06:36:09PM +0100, Stefan Sperling wrote:
> The book does indeed not seem to explicitly mention 'svn blame' in this
> context. I think that should be fixed.

Filed an issue with cmpilato's consent:
http://code.google.com/p/svnbook/issues/detail?id=183
Reply | Threaded
Open this post in threaded view
|

Re: svn blame not working for files which had binary mime-type in a previous revision

Ferenc Kovacs
In reply to this post by Stefan Sperling



On Thu, Jan 31, 2013 at 6:36 PM, Stefan Sperling <[hidden email]> wrote:
On Thu, Jan 31, 2013 at 05:51:53PM +0100, Ferenc Kovacs wrote:
> On Thu, Jan 31, 2013 at 5:17 PM, Stefan Sperling <[hidden email]> wrote:
> > At any revision, the entire file content could be swapped out and binary
> > content be replaced with text and vice versa. In which case you'd set the
> > property in some revisions and not in others. It may make sense to
> > use 'svn delete' and 'svn add' in this case to make the replacement
> > explicit, but that's not a hard requirement.

> That is all true, but doesn't really change the fact that this behavior
> won't be obvious to the user and there is no indication why does this
> happen.

OK, I agree that it might not be obvious to someone who doesn't know
how blame actually works internally. It works by incrementally diffing
all revisions that changed the file to figure out which revision
contributed which line.

Totally understandable, nothing to discuss here.
 
Since a binary file doesn't have a notion
of what a 'line' really is, this approach doesn't work for binary
files. Neither does it work if one or more revisions contain binary
content.

Same here.
 
(I suppose it could also skip such revisions, but it's not
trivial to know whether such behaviour would be useful in the general
case.)

This was something discussed in the old thread, but I don't liked that personally.

 

> I don't want to change the history, and a few sentences ago it seemed that
> you think that keeping it is important.

Some people want to change history, some don't. I didn't mean to
suggest that you'd have to do so. I pointed this out as another
possible workaround to consider.

I think as long as changing history as cumbersome as it is (last time I checked it was a third party python script not guaranteed to work, now you mentioned SvnDumpTool, which seems to be a little bit more mature) this would be both an overkill and a major feat for the average subversion user to accomplish. 
 

> But again: I doesn't want to change that behavior, just add a hint so that
> the user know why does it happen and how can he overcome that.

So to be clear: You propose to make the error message point out that
Subversion considers a file binary because of certain svn:mime-type values?

yep.
 

I have an alternate proposal (see below).

all ears.
 

> Your suggested fix tell the user how to proceed but still not help him/her
> to understand why does he need --force for some files when from all he/she
> can understand it works for the exact same file without --force.

Yes, the error message doesn't explain the underlying reason for the
problem. Neither do generic errors such as "file not found". I'm not
trying to give a silly example by mentioning "file not found", but
I'm trying to convey my point. You'll have some notion of possible
reasons why a file could be missing if you're accustomed to using a
file system on a computer. Same goes for people who are told 'this
file is binary' and understand how Subversion treats binary files.

File not found in itself isn't that confusing, and if you even have an absolute path, you are good to go 
A more comparable error message which comes to mind when executing a 32bit binary on a 64bit linux installation would error out with No such file or directory.
That sounds like something the user thinks he/she is familiar with, but the problem lies totally elsewhere.
That is bad interface design, because the error message doesn't help resolving the issue, and it is common enough, that you won't find even on google, if you don't know what you are looking for.
 

> - added a file with one line of content, commited,
> - set the mime-type to application/xml, commited,

Here, the file is marked as binary. Subversion doesn't care whether
the content actually is binary but trusts the user's judgement on this.

Yeah, I know this, but misunderstood your mail, and just realized my mistake:
I thought that subversion blame shouldn't look further back than the oldest revision in the blame output, but you wrote that isn't the case, but it shouldn't look further back if you specify revision range explicitly, and that indeed work and not skips the file.
 

I suppose we should improve Subversion's behaviour here by issuing a
warning if Subversion's own binary-file detection code doesn't identify
the file as binary when the user sets a binary mime-type.

What do you think about that?

svn: warning: 'application/xml' is a binary mime-type but file '%s' looks like text; diff, merge, blame, and other operations will stop working on this file

This way, we point out the cause of the problem when it first appears,
rather than later.

Yeah, that is something nice from the user POV, as it would lessen the chance of screwups.
Would it be possible to tell them to somehow show in there that only plain/*, image/x-xbitmap and image/x-xpixmap mimetypes can be used for those operations?
Ofc. this still wouldn't help much if you already has a file with a "wrong" mimetype and bump into the Skipping binary file error message.


> - changed the line, commited,
> - removed the mime-type, commited,
> - changed the only line, commited.
> - tried to svn blame: Skipping binary file: 'test.xml'
> - svn blame --force shows me a single line of content from the last revision

> Could you show me a single line of documentation where it is mentioned that
> svn blame will skip the file if it ever had a binary mime-type in the
> history of that file?
> I couldn't find that.

It's in the FAQ, see http://subversion.apache.org/faq.html#binary-files

The book does indeed not seem to explicitly mention 'svn blame' in this
context. I think that should be fixed.

I think that the FAQ is pretty clear in that regard that somebody who have read that has some/good chances that he/she can realize that what are the implication of the mimetype before the commit.
But when the deed is already done (and possibly by someone else in the past long forgotten everybody else but the subversion repo) it is really unintuitive to realize right away that Skipping the binary file means that the file is or WAS a binary file sometimes in the past.

--
Ferenc Kovács
@Tyr43l - http://tyrael.hu
Reply | Threaded
Open this post in threaded view
|

Re: svn blame not working for files which had binary mime-type in a previous revision

Ferenc Kovacs
In reply to this post by Stefan Sperling



On Thu, Jan 31, 2013 at 6:46 PM, Stefan Sperling <[hidden email]> wrote:
On Thu, Jan 31, 2013 at 06:36:09PM +0100, Stefan Sperling wrote:
> The book does indeed not seem to explicitly mention 'svn blame' in this
> context. I think that should be fixed.

Filed an issue with cmpilato's consent:
http://code.google.com/p/svnbook/issues/detail?id=183

Thanks for your time and patience.

--
Ferenc Kovács
@Tyr43l - http://tyrael.hu
Reply | Threaded
Open this post in threaded view
|

Re: svn blame not working for files which had binary mime-type in a previous revision

Stefan Sperling
In reply to this post by Ferenc Kovacs
On Thu, Jan 31, 2013 at 07:35:07PM +0100, Ferenc Kovacs wrote:

> > I suppose we should improve Subversion's behaviour here by issuing a
> > warning if Subversion's own binary-file detection code doesn't identify
> > the file as binary when the user sets a binary mime-type.
> >
> > What do you think about that?
> >
> > svn: warning: 'application/xml' is a binary mime-type but file '%s' looks
> > like text; diff, merge, blame, and other operations will stop working on
> > this file
> >
> > This way, we point out the cause of the problem when it first appears,
> > rather than later.
> >
>
> Yeah, that is something nice from the user POV, as it would lessen the
> chance of screwups.

Ok great! I'll try to make this happen.

> Would it be possible to tell them to somehow show in there that only
> plain/*, image/x-xbitmap and image/x-xpixmap mimetypes can be used for
> those operations?

I think that would be too detailed. I also don't like the fact that
there is a list of mime-types which are considered text.

The official mime-type list is not maintained by Subversion, and it would
be silly to attempt to keep up-to-date with the official list.

Because of that, I think it was a mistake to document these two 'image/*'
types as text. Just saying "no mime-type property or a text/* mime-type
property indicates textual content" would have been better.

> Ofc. this still wouldn't help much if you already has a file with a "wrong"
> mimetype and bump into the Skipping binary file error message.

Fair enough.
I'd be willing to extend the existing error message as follows:

  Skipping file which is considered binary in one or more revisions (use --force to treat as text): 'foo/bar.xml'

Would that be good enough?
Reply | Threaded
Open this post in threaded view
|

Re: svn blame not working for files which had binary mime-type in a previous revision

Ferenc Kovacs



On Thu, Jan 31, 2013 at 7:57 PM, Stefan Sperling <[hidden email]> wrote:
On Thu, Jan 31, 2013 at 07:35:07PM +0100, Ferenc Kovacs wrote:
> > I suppose we should improve Subversion's behaviour here by issuing a
> > warning if Subversion's own binary-file detection code doesn't identify
> > the file as binary when the user sets a binary mime-type.
> >
> > What do you think about that?
> >
> > svn: warning: 'application/xml' is a binary mime-type but file '%s' looks
> > like text; diff, merge, blame, and other operations will stop working on
> > this file
> >
> > This way, we point out the cause of the problem when it first appears,
> > rather than later.
> >
>
> Yeah, that is something nice from the user POV, as it would lessen the
> chance of screwups.

Ok great! I'll try to make this happen.

ty.
 

> Would it be possible to tell them to somehow show in there that only
> plain/*, image/x-xbitmap and image/x-xpixmap mimetypes can be used for
> those operations?

I think that would be too detailed. I also don't like the fact that
there is a list of mime-types which are considered text.

yeah, I also thought until now that only plain/* is considered as text for svn.
 

The official mime-type list is not maintained by Subversion, and it would
be silly to attempt to keep up-to-date with the official list.

uhm, you mean this isn't maintened by the subversion team?

maybe you are confusing this with the optional libmagic which is used for distinguishing between the various binary types? 
 

Because of that, I think it was a mistake to document these two 'image/*'
types as text. Just saying "no mime-type property or a text/* mime-type
property indicates textual content" would have been better.

I think that they should either no exceptions but only plain/* or they should be documented as they are currently.
 

> Ofc. this still wouldn't help much if you already has a file with a "wrong"
> mimetype and bump into the Skipping binary file error message.

Fair enough.
I'd be willing to extend the existing error message as follows:

  Skipping file which is considered binary in one or more revisions (use --force to treat as text): 'foo/bar.xml'

Would that be good enough?

yes!

ps: I've just realized that the gmail reply all feature was ccing myself for my every reply. weird.

--
Ferenc Kovács
@Tyr43l - http://tyrael.hu
Reply | Threaded
Open this post in threaded view
|

Re: svn blame not working for files which had binary mime-type in a previous revision

Philip Martin-4
In reply to this post by Stefan Sperling
Stefan Sperling <[hidden email]> writes:

> OK, I agree that it might not be obvious to someone who doesn't know
> how blame actually works internally. It works by incrementally diffing
> all revisions that changed the file to figure out which revision
> contributed which line. Since a binary file doesn't have a notion
> of what a 'line' really is, this approach doesn't work for binary
> files. Neither does it work if one or more revisions contain binary
> content.

What exactly goes wrong?  The current revision is text, not binary, and
the final output is the current file.  No part of the binary file gets
into the final output.

Suppose I have a file that really was binary in the past, perhaps a
shell script that used to be an ELF binary.  When blame reaches the
binary revision the binary data is likely to get treated as one or more
lines of text, none of which match the current text.  At that point the
blame algorithm is complete.  Isn't that the right answer?

I see I asked this question in the original thread but I don't see any
answer.

--
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download
Reply | Threaded
Open this post in threaded view
|

Re: svn blame not working for files which had binary mime-type in a previous revision

Philip Martin-4
Philip Martin <[hidden email]> writes:

> Stefan Sperling <[hidden email]> writes:
>
>> OK, I agree that it might not be obvious to someone who doesn't know
>> how blame actually works internally. It works by incrementally diffing
>> all revisions that changed the file to figure out which revision
>> contributed which line. Since a binary file doesn't have a notion
>> of what a 'line' really is, this approach doesn't work for binary
>> files. Neither does it work if one or more revisions contain binary
>> content.
>
> What exactly goes wrong?  The current revision is text, not binary, and
> the final output is the current file.  No part of the binary file gets
> into the final output.
>
> Suppose I have a file that really was binary in the past, perhaps a
> shell script that used to be an ELF binary.  When blame reaches the
> binary revision the binary data is likely to get treated as one or more
> lines of text, none of which match the current text.  At that point the
> blame algorithm is complete.  Isn't that the right answer?
>
> I see I asked this question in the original thread but I don't see any
> answer.

Since there appears to be no reason to check the mime-type of anything
other than the final output I made blame behave that way in r1445164.

--
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download
Reply | Threaded
Open this post in threaded view
|

RE: svn blame not working for files which had binary mime-type in a previous revision

Bert Huijben-5
In reply to this post by Ferenc Kovacs
This might not work as good as expected, as our blame always works old to new... so we will try to blame the elf binary *before* the shell script.
 
(and this isn’t easy to fix because the ra layer currently can only deliver the file versions in that direction)
 
A patch to libsvn_repos would be very welcome to allow switching to a more efficient walk in a future subversion version.
 
Bert
 
From: Philip Martin <[hidden email]>
Sent: February 12, 2013 2:32 PM
To: Ferenc Kovacs <[hidden email]>
CC: [hidden email]
Subject: Re: svn blame not working for files which had binary mime-type in a previous revision
 
Philip Martin <[hidden email]> writes:

> Stefan Sperling <[hidden email]> writes:
>
>> OK, I agree that it might not be obvious to someone who doesn't know
>> how blame actually works internally. It works by incrementally diffing
>> all revisions that changed the file to figure out which revision
>> contributed which line. Since a binary file doesn't have a notion
>> of what a 'line' really is, this approach doesn't work for binary
>> files. Neither does it work if one or more revisions contain binary
>> content.
>
> What exactly goes wrong?  The current revision is text, not binary, and
> the final output is the current file.  No part of the binary file gets
> into the final output.
>
> Suppose I have a file that really was binary in the past, perhaps a
> shell script that used to be an ELF binary.  When blame reaches the
> binary revision the binary data is likely to get treated as one or more
> lines of text, none of which match the current text.  At that point the
> blame algorithm is complete.  Isn't that the right answer?
>
> I see I asked this question in the original thread but I don't see any
> answer.

Since there appears to be no reason to check the mime-type of anything
other than the final output I made blame behave that way in r1445164.

--
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download
Reply | Threaded
Open this post in threaded view
|

Re: svn blame not working for files which had binary mime-type in a previous revision

Ferenc Kovacs
In reply to this post by Philip Martin-4


2013.02.12. 14:31, "Philip Martin" <[hidden email]> ezt írta:
>
> Philip Martin <[hidden email]> writes:
>
> > Stefan Sperling <[hidden email]> writes:
> >
> >> OK, I agree that it might not be obvious to someone who doesn't know
> >> how blame actually works internally. It works by incrementally diffing
> >> all revisions that changed the file to figure out which revision
> >> contributed which line. Since a binary file doesn't have a notion
> >> of what a 'line' really is, this approach doesn't work for binary
> >> files. Neither does it work if one or more revisions contain binary
> >> content.
> >
> > What exactly goes wrong?  The current revision is text, not binary, and
> > the final output is the current file.  No part of the binary file gets
> > into the final output.
> >
> > Suppose I have a file that really was binary in the past, perhaps a
> > shell script that used to be an ELF binary.  When blame reaches the
> > binary revision the binary data is likely to get treated as one or more
> > lines of text, none of which match the current text.  At that point the
> > blame algorithm is complete.  Isn't that the right answer?
> >
> > I see I asked this question in the original thread but I don't see any
> > answer.
>
> Since there appears to be no reason to check the mime-type of anything
> other than the final output I made blame behave that way in r1445164.
>

thanks, that is more than I hoped.

Reply | Threaded
Open this post in threaded view
|

Re: svn blame not working for files which had binary mime-type in a previous revision

Philip Martin-4
In reply to this post by Bert Huijben-5
I'm still not clear what would go wrong.

Each binary files get treated as one or more long lines.  If there are
multiple versions of the binary we will calculate diffs between these
lines.  The lines and the diffs are more or less meaningless.  At some
point the file becomes text and replaces all the binary lines.  At that
point the blame algorithm will discard the meaningless diffs and we get
the correct output.

What would go wrong?  Is the diff algorithm going to fall over on very
long lines?  I'm not aware of that being a limitation.  I suppose the
binary files might trigger the problem case where the minimal diff takes
a really long time to run.  We get that problem on text files, typically
machine generated, but I suppose some binary files might also trigger
it.  Is that going to be a significant problem?

Bert Huijben <[hidden email]> writes:

> This might not work as good as expected, as our blame always works old to
> new... so we will try to blame the elf binary *before* the shell script.
>
> (and this isn’t easy to fix because the ra layer currently can only deliver
> the file versions in that direction)
>
> A patch to libsvn_repos would be very welcome to allow switching to a more
> efficient walk in a future subversion version.
>
> Bert
>
>  *From:* Philip Martin <[hidden email]>
> *Sent:* February 12, 2013 2:32 PM
> *To:* Ferenc Kovacs <[hidden email]>
> *CC:* [hidden email]
> *Subject:* Re: svn blame not working for files which had binary mime-type
> in a previous revision
>
> Philip Martin <[hidden email]> writes:
>
>> Stefan Sperling <[hidden email]> writes:
>>
>>> OK, I agree that it might not be obvious to someone who doesn't know
>>> how blame actually works internally. It works by incrementally diffing
>>> all revisions that changed the file to figure out which revision
>>> contributed which line. Since a binary file doesn't have a notion
>>> of what a 'line' really is, this approach doesn't work for binary
>>> files. Neither does it work if one or more revisions contain binary
>>> content.
>>
>> What exactly goes wrong?  The current revision is text, not binary, and
>> the final output is the current file.  No part of the binary file gets
>> into the final output.
>>
>> Suppose I have a file that really was binary in the past, perhaps a
>> shell script that used to be an ELF binary.  When blame reaches the
>> binary revision the binary data is likely to get treated as one or more
>> lines of text, none of which match the current text.  At that point the
>> blame algorithm is complete.  Isn't that the right answer?
>>
>> I see I asked this question in the original thread but I don't see any
>> answer.
>
> Since there appears to be no reason to check the mime-type of anything
> other than the final output I made blame behave that way in r1445164.
>
> --
> Certified & Supported Apache Subversion Downloads:
> http://www.wandisco.com/subversion/download

--
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download
Reply | Threaded
Open this post in threaded view
|

RE: svn blame not working for files which had binary mime-type in a previous revision

Bert Huijben-5


> -----Original Message-----
> From: MARTIN PHILIP [mailto:[hidden email]] On Behalf Of
> Philip Martin
> Sent: dinsdag 12 februari 2013 19:30
> To: Bert Huijben
> Cc: Ferenc Kovacs; [hidden email]
> Subject: Re: svn blame not working for files which had binary mime-type in a
> previous revision
>
> I'm still not clear what would go wrong.

From an earlier mail (copied from below):
>> Suppose I have a file that really was binary in the past, perhaps a
>> shell script that used to be an ELF binary.  When blame reaches the
>> binary revision the binary data is likely to get treated as one or more
>> lines of text, none of which match the current text.  At that point the
>> blame algorithm is complete.  Isn't that the right answer?

This assumes that we blame backwards, while we really run blame forwards.

For many cases running it backwards would be more efficient (e.g. you could just ask for the lines in  the last 100 changed revisions)... Or in your own code stop when one specific interesting line gets changed and cancel the incoming data.

But that requires extending the api that is below the svn_ra_file_revisions() api to allow passing a reversed range.

        Bert

Reply | Threaded
Open this post in threaded view
|

Re: svn blame not working for files which had binary mime-type in a previous revision

Johan Corveleyn-3
On Tue, Feb 12, 2013 at 9:25 PM, Bert Huijben <[hidden email]> wrote:

>
>
>> -----Original Message-----
>> From: MARTIN PHILIP [mailto:[hidden email]] On Behalf Of
>> Philip Martin
>> Sent: dinsdag 12 februari 2013 19:30
>> To: Bert Huijben
>> Cc: Ferenc Kovacs; [hidden email]
>> Subject: Re: svn blame not working for files which had binary mime-type in a
>> previous revision
>>
>> I'm still not clear what would go wrong.
>
> From an earlier mail (copied from below):
>>> Suppose I have a file that really was binary in the past, perhaps a
>>> shell script that used to be an ELF binary.  When blame reaches the
>>> binary revision the binary data is likely to get treated as one or more
>>> lines of text, none of which match the current text.  At that point the
>>> blame algorithm is complete.  Isn't that the right answer?
>
> This assumes that we blame backwards, while we really run blame forwards.
>
> For many cases running it backwards would be more efficient (e.g. you could just ask for the lines in  the last 100 changed revisions)... Or in your own code stop when one specific interesting line gets changed and cancel the incoming data.
>
> But that requires extending the api that is below the svn_ra_file_revisions() api to allow passing a reversed range.
>

Mmmmm, making "reverse blame" possible ... would be very nice indeed
(and it's been on my radar ever since I started hacking on SVN --
unfortunately I can't seem to make the time to work on this).

Apparently there is a (very old) patch by Daniel Berlin, which adds
this feature (including the RA functionality). It's attached to issue
#2138 [1]. Unfortunately, Daniel decided later (see [2]) that it
wouldn't be worth the code churn at that time, because it didn't make
that much of a difference in measurements. Perhaps it would be more
interesting nowadays, because other parts of blame have been made
faster a lot already (because of diff performance).

I don't know if it's feasible to revive that patch to current code.
Perhaps taking only the RA part would be doable (and we can postpone
the client side algorithm to a later release)?

[1] http://subversion.tigris.org/issues/show_bug.cgi?id=2138 - finish
the new, faster blame algorithm

[2] http://svn.haxx.se/dev/archive-2005-09/0797.shtml

--
Johan